granular synthesis and auditory segmentation ("John K. Bates" )


Subject: granular synthesis and auditory segmentation
From:    "John K. Bates"  <jkbates(at)COMPUTER.NET>
Date:    Fri, 16 Oct 1998 10:29:26 -0400

Dear List, From my previous posting I said: > In other words, you can bypass the time-frequency limit by > ignoring it. I realize now that this statement used a poor choice of words. Because of that, people seem to have wandered away from the point about granular synthesis and have zeroed in on the halfwave's apparent difficulty with Fourier theory and cochlear dynamics. What I meant was that in collapsing the size of a grain you don't have to stop at the Gabor limit as long as the sound of a halfwave fragment has meaning to the ear. As August Seebeck said around 1850, "How else can the question as to what makes out a tone be decided but by the ear?" So why worry about time-frequency limits just to be mathematically correct? What I discovered was that a halfwave of a given duty ratio can produce an auditory sensation (Let's call it timbre.) that can be identified in a phonetic classification, even if it is from a non-speech source. Therefore, by creating halfwaves having known timbre classifications and concatenating them according to phonological rules I found it possible to generate speech and other sounds. To simplify the rulemaking I used original speech as a template from which my segmenting algorithm extracted the information needed to reconstruct the speech. As a backup for evaluating the ear's perception I made spectrograms of both the original and reconstructed utterances. They had broadly similar patterns, but here the ear seems more tolerant than the eye. In discussing grain size and spectral features, it would seem that neither a Gabor-limited nor a halfwave grain could have useful spectral shape other than a measure of bandwidth. And if this is true, we next have to account for how a spectrum-analyzing cochlea can provide information that allows the ear to find instantaneous meaning in bandwidth simultaneously with narrowband harmonic analysis. (timbre and pitch?...and what about whispers?) Obviously, while this might be too difficult a task for the cochlea, the deus ex machina in the brain can solve the problem. As for general use in granular synthesis, my experience suggests that with their near-minimum time intervals halfwaves have advantages over the conventional grain method with their simplicity and greater flexibility. Using nothing but halfwaves controlled by their duty ratios and time epochs, I have done a few experiments synthesizing speech including fricatives, stops, vowels, and variable pitch. Although the timbre was rough, it was intelligible. Since the method can give good stop consonants, I think that a granular approach to speech synthesis could improve on what is currently available. By the way, isn't it interesting that, along with the segments, my granular analysis algorithm gets the pitch, envelope, and V/UV? It also gets direction of arrival. All done without a spectrum analyzer. Best regards, John Bates Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv


This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University