[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Granular synthesis and auditory segmentation
Peter,
Thank you for pasting the note of Mssr. Didier with yours as it now
gives me the opportunity to discuss both of your responses in this
single email.
>I don't know about your neurons, but mine completely fail
>to replenish their synapses above about 1 or 2 kHz even
>after plenty of coffee.
... Actually many physiology books discuss the refractory
period (the ability to replenish chemical balance) of
Cochlear neurons as operating to about 5"KHz".
... A detail readily confirmed by the literature.
... Given that most telephony systems run 300"Hz" to 3"KHz",
the 5"KHz" refractory period does well in practical situations!
>Of course there is a role for non-Fourier type processing too, but
>no simple scheme covers the entire audible [20 Hz, 20 kHz] range.
... True, this simple scheme merely covers all of speech
communications.
... Singing (resonant redundancies) and mechanical
vibrations in string and wind instruments have other cues.
>Didier Depireux clarified the issue very nicely:
>
>> The half-wave rectification occurs _after_ the frequency
>> decomposition performed on the basilar membrane, i.e.
>> after you have decomposed the signal into frequency channels.
... Ah, The Place Theory!
... If you truly believe in Fourier Analysis then you also believe
in the inverse transform. However, the local, resonant response
of a stretched membrane (the Place Theory) is only useful for a
sinusoidal drive. Speech presents complex structures in time
and, a resonant response at any GIVEN time (at a PLACE) on the
Basilar membrane is NOT the same thing as a FULL spectral
analysis which produces amplitude and PHASE information at
MANY "frequencies" such that an inverse transform is possible.
... It is also a well known fact that (Binaural) localization has a
10microsecond resolution (1 to 2 spatial degrees) and, that this
resolution is crucial to explaining the Cocktail Party Effect.
... 10microsecond resolution implies a Fourier sample window
of approx the same size --- which implies a "spectral" resolution
of 100"KHz", i.e., quite useless if acoustic analysis is the goal.
>the "textural aspect" (i.e., the patterning)
>in sound textures perceptually relatively invariant to their
>position in the time-frequency plane with a typical [0s, 1s]
>by [500 Hz, 5 kHz] area. That is also what I would want, since
>it keeps the perceptual qualities of overall time-frequency
>"position" and sound "pattern" largely independent, just as
>in vision the texture of an object doesn't appear to change
>and interfere with position of that object in the visual field.
>
>Of course the "art" is to optimize this preservation of
>invariants in the cross-modal mapping, while maximizing
>resolution and ease of perception (including "proper"
>grouping and segregation, possibly by manipulating the
>sound textures).
... I also work with graphical speech patterns.
... But, my structures are time-locked to SOURCES in the
acoustic environment.
... And, Self-Organized Neural Maps detect and classify
these structures.
... You must solve the SOURCE localization problem before
ANY (source) analysis is EVER attempted.
... I wish to make this point emphatically - one can ONLY
analyze a SOURCE and, before one CAN do SOURCE analysis,
one MUST isolate the information from THAT source.
... This is precisely what occurs during the Cocktail Party Effect,
i.e., a SOURCE is isolated and analysis is focused on the
information produced by THAT source.
... Spectral analysis of a Acoustic point in space is essentially
useless since the resultant frequencies to be ASSOCIATED with
a PARTICULAR SOURCE remains unknown!
... But, this result is to be expected as the typical FFT sample
window of 10 milliseconds is 1,000 times larger than the raw,
human, localization resolution of 10 microseconds, i.e., the 1 to 2
Spatial Degrees of SOURCE resolution are completely buried -
actually, it's more appropriate to say that Spatial Resolution has
been lost in the 10 millisecond AVERAGING process used to
calculate "spectral" components.
Rich Fabbri
McGill is running a new version of LISTSERV (1.8d on Windows NT).
Information is available on the WEB at http://www.mcgill.ca/cc/listserv