1
General Software Discussion / New duplicate audio files finder... I am open to features suggestion
« on: November 15, 2009, 04:47 PM »
Hello DC Users,
For me as a developer, I feel it is always good thing to ask users what they would like to see in an application before it is even released. So I am starting this thread, a similar to the one I did for DupeTrasher earlier this year here.
I am in the process of developing a new piece of software similar to my DupeTrasher duplicate file finder but this time a new app will be specialized for audio files only. Similarly, to the duplicate photo finders, this software should be able to find duplicate and similar audio files by comparing audio data in them (basically listening how they sound). In addition, it will use other parameters for duplicate detection like ID3 tags, name of the file and binary content, but those will be of secondary importance because that file data can be wrong and I want detection to be almost independent of them. With "audio listening", it should be able to recognize dupes even if they have different name, tag, bitrates, sample rate and file type (mp3/wma/ogg/flac...). It is designed specifically to resist lossy encodings but in some cases it should be able to even detect different performances of the same song (eg. live/album/remix).
Of course, this is just the first half of the problem which I have almost completed. The second and equally important is presenting information to the user so that he can decide what files should be removed in a shortest amount of time and with the least effort.
So feel free to post your general suggestions and requests (if you have any and if you are interested in software like this). If you have some common scenarios, where duplicate audio files are involved let me know so that I can analyze and find a solution for it, in the form of a feature that can make your life easier. Your ideas on graphic design and window layouts are also welcome.
One of the things I am currently brainstorming right now is how to present the search results to the user, considering that same song can be found in two different files which are not exactly equal. That is, their audio data is not 100% exact but rather similar to some extent (eg. 90% or more due to different encoding options/noise in the background/quality of the recording; unlike regular duplicate files in DupeTrasher which are completely equal so grouping and presenting them to the user was much easier task to me). So basically, lets say that songX can be found in forms of fileA, fileB and fileC, where
- fileA and fileB are 90% similar
- fileA and fileC are 90%
- but fileB and fileC are only 70% smilar
The reason for this is due to probabilistic nature of non-exact comparing (that is I use percents and similarity rather than "exactly equal" or "not equal" as in DupeTrasher). Now the question I have is, should I include all three files in the same duplicate group? And what about if fileC is 90% similar to the fileD which is just 50% similar fileA and fileB? I hope you can see the issue here.
The easiest way for me is to present a list of all files that have at least one duplicate and then let user to mark for delete the files he is interested in. So there would be no grouping of any kind just the plain list. But I don't think this is very coherent way of presenting search results of the duplicates search (I will probably include it just as one of the views though).
Of course, as always DC users will have special treatment for me. Beside discounts, I will provide free licenses for monthly newsletter, beta testers and other contributors.
Thanks! :)
For me as a developer, I feel it is always good thing to ask users what they would like to see in an application before it is even released. So I am starting this thread, a similar to the one I did for DupeTrasher earlier this year here.
I am in the process of developing a new piece of software similar to my DupeTrasher duplicate file finder but this time a new app will be specialized for audio files only. Similarly, to the duplicate photo finders, this software should be able to find duplicate and similar audio files by comparing audio data in them (basically listening how they sound). In addition, it will use other parameters for duplicate detection like ID3 tags, name of the file and binary content, but those will be of secondary importance because that file data can be wrong and I want detection to be almost independent of them. With "audio listening", it should be able to recognize dupes even if they have different name, tag, bitrates, sample rate and file type (mp3/wma/ogg/flac...). It is designed specifically to resist lossy encodings but in some cases it should be able to even detect different performances of the same song (eg. live/album/remix).
Of course, this is just the first half of the problem which I have almost completed. The second and equally important is presenting information to the user so that he can decide what files should be removed in a shortest amount of time and with the least effort.
So feel free to post your general suggestions and requests (if you have any and if you are interested in software like this). If you have some common scenarios, where duplicate audio files are involved let me know so that I can analyze and find a solution for it, in the form of a feature that can make your life easier. Your ideas on graphic design and window layouts are also welcome.
One of the things I am currently brainstorming right now is how to present the search results to the user, considering that same song can be found in two different files which are not exactly equal. That is, their audio data is not 100% exact but rather similar to some extent (eg. 90% or more due to different encoding options/noise in the background/quality of the recording; unlike regular duplicate files in DupeTrasher which are completely equal so grouping and presenting them to the user was much easier task to me). So basically, lets say that songX can be found in forms of fileA, fileB and fileC, where
- fileA and fileB are 90% similar
- fileA and fileC are 90%
- but fileB and fileC are only 70% smilar
The reason for this is due to probabilistic nature of non-exact comparing (that is I use percents and similarity rather than "exactly equal" or "not equal" as in DupeTrasher). Now the question I have is, should I include all three files in the same duplicate group? And what about if fileC is 90% similar to the fileD which is just 50% similar fileA and fileB? I hope you can see the issue here.
The easiest way for me is to present a list of all files that have at least one duplicate and then let user to mark for delete the files he is interested in. So there would be no grouping of any kind just the plain list. But I don't think this is very coherent way of presenting search results of the duplicates search (I will probably include it just as one of the views though).
Of course, as always DC users will have special treatment for me. Beside discounts, I will provide free licenses for monthly newsletter, beta testers and other contributors.
Thanks! :)