DonationCoder.com Software > Post New Requests Here
REQUEST: audio comparer
agentsteal:
Could someone please write a program that takes two 1-second-long wav files and outputs a number that represents the difference between the sounds? It would output 0 if the samples are exactly the same. If the sounds are a good match it would output a small number but if they are very different it would output a large number. My goal is to be able to replace the sound during one second of an audio file with one of my own sounds, and have it sound as close as possible to the original.
mouser:
It's actually sounds like an interesting project for someone who is interested in sound -- comparing two sounds and coming up with some metric of similarity.
I'm not exactly sure what a good metric would be -- perhaps you would want to come up with multiple metrics.. but clearly there are some good examples of this being used in practice -- such as those recent services that can identify a song from a small snippet.
Maybe a coder here who works with sound files will give this a shot.. It's also possible you could find some commandline hardcore sound file analysis tools that would extract info you could use to compute a simple metric of similarity.
Essentially what i'm saying is that this is not an unreasonable request -- just have to find someone with a little experience with working with sound files at a low-level who understands enough about them to come up with some useful metrics to measure similarity.
Jan-S:
Hey! This sure sounds interesting... anybody willing to discuss possible metrics?
On a first thought, I'd split the file into N pieces, multiply each piece with a Gaussian window function to reduce the sharp edges, perform a fourier transformation of each piece and calculate the absolute values, getting rid of the phase information. These transformed pieces can be treated as N vectors.
To obtain the similarity of two files A and B, I'd calculate the vectors for both files, compute the cosine of the angles between each of the N pairs of vectors (dot product divided by norms), add their absolute values up and divide the result by N.
This would give a number between 1 (sounds are identical w.r.t. the metrics) and 0 (sounds do not have anything to do with each other).
For N=1, this metric would be based solely on the frequency components of the sounds; for greater values of N it would include timing information (i.e. when did particular frequencies occur).
Any comments on this approach? Other ideas? Questions?
@agentsteal: Are the sound files all the same format (sampling rate, bit depth)? Mono or stereo? I'd be interested in experimenting with the metrics, but not that interested in figuring out how to parse or resample the files.
What kind of sounds do you have? Could you provide some examples (denoting whether you would classify them as similar or not)? This would be important to test and tune the metrics.
agentsteal:
I don't think it would be a problem to require that all sounds are the same format.
I don't have any specific examples. I just think it would be a cool program to experiment with. It would be interesting to see if it's possible to remake songs etc out of a set of sounds.
I guess you can use midi as an example. All of the wav to midi converters that are available now compare tones rather than sounds. All they find is the basic notes. But I bet you could convert the actual sound of the wav file to midi with a program that compares the sounds of all combinations of 128 notes on 128 instruments.
I was wondering: does sound have a dithering effect? Like would playing a really high note followed by a really low note equal a note in the middle?
Jan-S:
I don't have any specific examples. I just think it would be a cool program to experiment with. It would be interesting to see if it's possible to remake songs etc out of a set of sounds.-agentsteal (December 05, 2008, 11:48 AM)
--- End quote ---
Without any examples, I don't have anything to test the approach against. Besides, you would need a really huge set of sounds then, if you try to recombine it out of 1 second pieces (which is quite long) and still be able to recognize anything!
I guess you can use midi as an example. All of the wav to midi converters that are available now compare tones rather than sounds. All they find is the basic notes. But I bet you could convert the actual sound of the wav file to midi with a program that compares the sounds of all combinations of 128 notes on 128 instruments.-agentsteal (December 05, 2008, 11:48 AM)
--- End quote ---
Intelliscore Music Recognition claims to be able to recognize more than one instrument, at least if you tell it which instruments it should listen to, but you still have to manually tweak the MIDI file afterwards (the results are not perfect).
The approach of comparing sounds might theoretically work, but there are several challenges: Firstly, you have to find metrics that are robust to noise and to overlayed instruments playing at the same time. For instance, if you are looking at an excerpt of the file where both a piano and a guitar play a chord composed of several notes, the comparison will find good matches for several piano, guitar, banjo, ukulele or harp sounds. It's difficult to construct the metrics such that the piano and guitar matches are guaranteed to produce the highest scores.
Secondly, you do not only have to recognize which instrument is playing which note, but also when the note is played. Unfortunately, the frequency information is imprecise when you increase the timing resolution and vice versa (this is related to the Heisenberg uncertainty principle). Finding a way to recognize a note attack so the comparison can be applied at the right position is not easy - especially as there are instruments (like violins or flutes) that do not have a sharp attack and can play a note at almost arbitrary length.
A funny approach would be randomly generating MIDI files, comparing their sound to the original song and evolving the best matches by recombination and mutation (Genetic algorithms). But probably this will take ages because the searching space is too big :)
I was wondering: does sound have a dithering effect? Like would playing a really high note followed by a really low note equal a note in the middle?
-agentsteal (December 05, 2008, 11:48 AM)
--- End quote ---
I don't think so. Even if you play two pure tones at the same time, the brain is quite good at figuring out the two tones. You can try this with an audio editor that is capable of generating sine waves. Or maybe I misunderstood your question?
Navigation
[0] Message Index
[#] Next page
Go to full version