[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

granular synthesis and auditory segmentation



Dear List,
   From my previous posting I said:

> In other words, you can bypass the time-frequency limit by
> ignoring it.

   I realize now that this statement used a poor choice of words.  Because
of that, people seem to have wandered away from the point about granular
synthesis and have zeroed in on the halfwave's apparent difficulty with
Fourier theory and cochlear dynamics.
   What I meant was that in collapsing the size of a grain you don't have
to stop at the Gabor limit as long as the sound of a halfwave fragment has
meaning to the ear.  As August Seebeck said around 1850, "How else can the
question as to what makes out a tone be decided but by the ear?"  So why
worry about time-frequency limits just to be mathematically correct?
   What I discovered was that a halfwave of a given duty ratio can produce
an auditory sensation (Let's call it timbre.) that can be identified in a
phonetic classification, even if it is from a non-speech source.
Therefore, by creating halfwaves having known timbre classifications and
concatenating them according to phonological rules I found it possible to
generate speech and other sounds.  To simplify the rulemaking I used
original speech as a template from which my segmenting algorithm extracted
the information needed to reconstruct the speech. As a backup for
evaluating the ear's perception I made spectrograms of both the original
and reconstructed utterances.  They had broadly similar patterns, but here
the ear seems more tolerant than the eye.
  In discussing grain size and spectral features, it would seem that
neither a Gabor-limited nor a halfwave grain could have useful spectral
shape other than a measure of bandwidth.  And if this is true, we next have
to account for how a spectrum-analyzing cochlea can provide information
that allows the ear to find instantaneous meaning in bandwidth
simultaneously with narrowband harmonic analysis.  (timbre and pitch?...and
what about whispers?)  Obviously, while this might be too difficult a task
for the cochlea, the deus ex machina in the brain can solve the problem.
   As for general use in granular synthesis, my experience suggests that
with their near-minimum time intervals halfwaves have advantages over the
conventional grain method with their simplicity and greater flexibility.
Using nothing but halfwaves controlled by their duty ratios and time
epochs, I have done a few experiments synthesizing speech including
fricatives, stops, vowels, and variable pitch. Although the timbre was
rough, it was intelligible.  Since the method can give good stop
consonants, I think that a granular approach to speech synthesis could
improve on what is currently available.
   By the way, isn't it interesting that, along with the segments, my
granular analysis algorithm gets the pitch, envelope, and V/UV?  It also
gets direction of arrival. All done without a spectrum analyzer.

   Best regards,

    John Bates

Email to AUDITORY should now be sent to AUDITORY@lists.mcgill.ca
LISTSERV commands should be sent to listserv@lists.mcgill.ca
Information is available on the WEB at http://www.mcgill.ca/cc/listserv