Re: Granular synthesis and auditory segmentation ("John K. Bates" )

Subject: Re: Granular synthesis and auditory segmentation From: "John K. Bates" <jkbates(at)COMPUTER.NET> Date: Wed, 14 Oct 1998 10:09:29 -0400 Dear List, The topic of granularity and segmentation relates to some experiments I did in the early 1990s looking for the reason for the high intelligibility of differentiated clipped speech. I discovered that fragments (grains) of waveforms taken from intervals as small as the period between unidirectional zero crossings (halfwaves) contain sufficient information to characterize speech as well as other sounds. From this discovery I developed an algorithm that recognizes transitions in the fragment shapes that locate phonetic segments. To test the validity of the approach I compared the timbre and intelligibility of words and/or sentences that are reconstructed from the fragment data against the sounds of the original utterances. The reconstruction algorithm uses information derived from the segment analyzer as follows: (1) from each segment select a halfwave fragment, (2) label each segment's phonetic class, (3) extract each segment's prosody (pitch and envelope information), and (4) label whether it is voiced or unvoiced. Much of the reconstructed waveform intelligibility has been quite good, even with fricatives and whispers. I concluded that the method was on the correct path. The problem is that details of these experiments and results are currently unpublished except for a poster paper I presented at the 1991 Whistler IEEE Workshop on speech coding applications. The object of the paper was to show that the segmentation algorithm could have application in speech coding by using the very high compression of speech data that is available in the redundancy of granular phonetic information. However, despite the promising results, I found that building a model suitable for demonstrating a commercial speech coder was beyond my personal resources. Nevertheless, I think I have shown that waveform fragments that are smaller than the conventional grain size contain the timbre information from which a variety of sounds may be synthesized. In other words, you can bypass the time-frequency limit by ignoring it. For anyone familiar with APL language I could send some software to play with. The best way to understand this is to hear it. Best wishes, John Bates >Has anyone systematically explored the use of >"granular synthesis" in manipulating auditory >streaming and segregation? > >I'd like to see connections between Bregman's >interesting ASA work and mesoscopic auditory >textures. > >I expect such textures to be helpful in controlling >auditory grouping and segregation, and since my own >work maps visual textures to auditory textures in a >way closely related to granular synthesis, there is >an obvious connection that could be of practical >interest. > >Best wishes, > >Peter Meijer Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv

This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University