I don't have any specific examples. I just think it would be a cool program to experiment with. It would be interesting to see if it's possible to remake songs etc out of a set of sounds.
Without any examples, I don't have anything to test the approach against. Besides, you would need a really huge set of sounds then, if you try to recombine it out of 1 second pieces (which is quite long) and still be able to recognize anything!
I guess you can use midi as an example. All of the wav to midi converters that are available now compare tones rather than sounds. All they find is the basic notes. But I bet you could convert the actual sound of the wav file to midi with a program that compares the sounds of all combinations of 128 notes on 128 instruments.Intelliscore Music Recognition
claims to be able to recognize more than one instrument, at least if you tell it which instruments it should listen to, but you still have to manually tweak the MIDI file afterwards (the results are not perfect).
The approach of comparing sounds might theoretically work, but there are several challenges: Firstly, you have to find metrics that are robust to noise and to overlayed instruments playing at the same time. For instance, if you are looking at an excerpt of the file where both a piano and a guitar play a chord composed of several notes, the comparison will find good matches for several piano, guitar, banjo, ukulele or harp sounds. It's difficult to construct the metrics such that the piano and guitar matches are guaranteed to produce the highest scores.
Secondly, you do not only have to recognize which instrument is playing which note, but also when
the note is played. Unfortunately, the frequency information is imprecise when you increase the timing resolution and vice versa (this is related to the Heisenberg uncertainty principle). Finding a way to recognize a note attack so the comparison can be applied at the right position is not easy - especially as there are instruments (like violins or flutes) that do not have a sharp attack and can play a note at almost arbitrary length.
A funny approach would be randomly generating MIDI files, comparing their sound to the original song and evolving the best matches by recombination and mutation (Genetic algorithms
). But probably this will take ages because the searching space is too big
I was wondering: does sound have a dithering effect? Like would playing a really high note followed by a really low note equal a note in the middle?
I don't think so. Even if you play two pure tones at the same time, the brain is quite good at figuring out the two tones. You can try this with an audio editor that is capable of generating sine waves. Or maybe I misunderstood your question?