Subject: Speech through few channels From: at <bregmanCCRMA.STANFORD.EDU> Date: Fri, 10 Jun 1994 19:10:26 PST---------- Dear Ed, I am sorry I took so long to get around to it but I finally had a chance to listen to the tape you sent of the simulation of speech and of music heard through as few as 4 channels. It is evident from listening to them that the recognizability of the speech survives better than that of the music. The reason for this is fairly clear: the simulation destroys the pitch information, but seems to retain gross spectral information in the low channels and good timing information in the high ones. This sort of information reduction hurts the speech a lot less than it does the music. It is obvious that speech can be quite intelligible without pitch information, as we can see in the case of whispered speech. However, pitch is very important to most music. I imagine that if you changed the relative importance of pitch in speech and music -- say by using Chinese, in which tones are important, as the speech, and drumming as the music, you might get results less favorable to speech. Another way to increase the importance of pitch in speech would be to ask the listener to report on the emotional content of the speech. In any case, the conclusion I draw is not that speech recognition needs less acoustic information than music recognition does, but rather that the two domains depend more heavily on different kinds of information. Presumably the recognition process, in a lot of different domains of sound, is sensitive to the integrity of different acoustic features. Therefore there is no single best way to reduce information: The best kind of information reduction depends on the domain. Best wishes, Al Bregman