ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

New duplicate audio files finder... I am open to features suggestion

(1/2) > >>

AsmDev:
Hello DC Users,

For me as a developer, I feel it is always good thing to ask users what they would like to see in an application before it is even released. So I am starting this thread, a similar to the one I did for DupeTrasher earlier this year here.

I am in the process of developing a new piece of software similar to my DupeTrasher duplicate file finder but this time a new app will be specialized for audio files only. Similarly, to the duplicate photo finders, this software should be able to find duplicate and similar audio files by comparing audio data in them (basically listening how they sound). In addition, it will use other parameters for duplicate detection like ID3 tags, name of the file and binary content, but those will be of secondary importance because that file data can be wrong and I want detection to be almost independent of them. With "audio listening", it should be able to recognize dupes even if they have different name, tag, bitrates, sample rate and file type (mp3/wma/ogg/flac...). It is designed specifically to resist lossy encodings but in some cases it should be able to even detect different performances of the same song (eg. live/album/remix).

Of course, this is just the first half of the problem which I have almost completed. The second and equally important is presenting information to the user so that he can decide what files should be removed in a shortest amount of time and with the least effort.

So feel free to post your general suggestions and requests (if you have any and if you are interested in software like this). If you have some common scenarios, where duplicate audio files are involved let me know so that I can analyze and find a solution for it, in the form of a feature that can make your life easier. Your ideas on graphic design and window layouts are also welcome.

One of the things I am currently brainstorming right now is how to present the search results to the user, considering that same song can be found in two different files which are not exactly equal. That is, their audio data is not 100% exact but rather similar to some extent (eg. 90% or more due to different encoding options/noise in the background/quality of the recording; unlike regular duplicate files in DupeTrasher which are completely equal so grouping and presenting them to the user was much easier task to me). So basically, lets say that songX can be found in forms of fileA, fileB and fileC, where
- fileA and fileB are 90% similar
- fileA and fileC are 90%
- but fileB and fileC are only 70% smilar
The reason for this is due to probabilistic nature of non-exact comparing (that is I use percents and similarity rather than "exactly equal" or "not equal" as in DupeTrasher). Now the question I have is, should I include all three files in the same duplicate group? And what about if fileC is 90% similar to the fileD which is just 50% similar fileA and fileB? I hope you can see the issue here.

The easiest way for me is to present a list of all files that have at least one duplicate and then let user to mark for delete the files he is interested in. So there would be no grouping of any kind just the plain list. But I don't think this is very coherent way of presenting search results of the duplicates search (I will probably include it just as one of the views though).


Of course, as always DC users will have special treatment for me. Beside discounts, I will provide free licenses for monthly newsletter, beta testers and other contributors.

Thanks! :)

Curt:
I download several thousand audio files each month. Because of "Best of..", "The Very Best of..." I automatically gets a lot of duplicates. But these files have not yet been freed of DRM, so they are not yet placed in My Music, but in a special download folder. So one of my wishes for the program's navigator will be the option to Remember & Scan Favorite Folder other than My Music, please.


hmm... was this understandable?

Your questions were too difficult for me to answer.

tomos:
I've not used any file duplicate apps (-yet, I recently bought DupeTrasher but havent had a chance to install yet) so any suggestions may vary wildly :)

It seems to me that ít would be good to be able to easily differentiate between Exactly the Same & Similar - similar being where the percentages come in

Also, maybe an indication of what 90% similar could mean - could that be a different file (e.g. diff bitrate) but actually the same recording.
-
Another example - this may be unusual - I have a small collection but relatively a lot of live tracks - I often crop them at either end if there's too much waffling & naturally keep the original recordings. Would the app be able to tell me they exactly the same except one is cropped (this not really a feature request but you did ask for scenarios!)

sajman99:
Good news, AsmDev. I look forward to your new audio comparison software.

I hope it will have sufficient options so the user can implement his/her own preferences. For example, if my preference is to start comparison at a similarity/tolerance of 60%, I should be able to do that without moving down from a higher pre-set level.

Likewise, in observing matches I would like a choice as to how they are grouped--something like "show all matches" or "show most relevant matches". Regardless of the presentation of matches, the software will still have to perform the same amount of work (if I understand correctly), but the more presentation options, the better!

Also, some type of (optional) cache management feature would be convenient so users don't have to start at ground zero every time they scan a large music collection.

Good Luck with development. :)

AsmDev:
Curt, If I understand you correctly, you would like to have an option that will add new music files to your collection only if they are not already present there? That is something I have also in mind since many users have sorted music collection and often they need to add new songs, so automated solution that will check if they already have those songs (with additional info like is it the better quality than the new file) would be useful


It seems to me that ít would be good to be able to easily differentiate between Exactly the Same & Similar - similar being where the percentages come in

Also, maybe an indication of what 90% similar could mean - could that be a different file (e.g. diff bitrate) but actually the same recording.
-tomos (November 16, 2009, 08:01 AM)
--- End quote ---
Ok thanks, this sounds reasonable. I will probably create several views as in DupeTrasher... some for exact dupes and similar ones with information on how they differ.

Another example - this may be unusual - I have a small collection but relatively a lot of live tracks - I often crop them at either end if there's too much waffling & naturally keep the original recordings. Would the app be able to tell me they exactly the same except one is cropped (this not really a feature request but you did ask for scenarios!)
-tomos (November 16, 2009, 08:01 AM)
--- End quote ---

Well in general it will be proof to the silence on the beginning of the track but I am not sure how it would handle stuff like this. Waffling and noise is part of the audio information and currently can't be treated separately. The audio detection is optimized for detecting same song in different qualities
However in scenarios like this I thik I can use supplementing features to identify duplicates (eg. fuzzy matching of the file names and ID3 tags). I'll do some tests and see how it goes...

I hope it will have sufficient options so the user can implement his/her own preferences. For example, if my preference is to start comparison at a similarity/tolerance of 60%, I should be able to do that without moving down from a higher pre-set level.
-sajman99 (November 16, 2009, 04:12 PM)
--- End quote ---
Yea sure, that will be included. In my testing so far I concluded that 90% or more match will identify the same song of different bitrate/samplerate/other quality parameters. In some cases, however, there is ~70% match if the encoding quality between two files of the same song is large (eg. flac and low bitrate wma). So definetely a must have feature.

Likewise, in observing matches I would like a choice as to how they are grouped--something like "show all matches" or "show most relevant matches". Regardless of the presentation of matches, the software will still have to perform the same amount of work (if I understand correctly), but the more presentation options, the better!
-sajman99 (November 16, 2009, 04:12 PM)
--- End quote ---

I am with you on this; the main reason for this thread is that I wanted to hear users how they would like to be presented with search data. It would help if you could describe this in more details, for example did you mean by "show most relevant matches" that program should show only matches that are 90% or higher?

Also, some type of (optional) cache management feature would be convenient so users don't have to start at ground zero every time they scan a large music collection.
-sajman99 (November 16, 2009, 04:12 PM)
--- End quote ---

That is already done  :Thmbsup:

Navigation

[0] Message Index

[#] Next page

Go to full version