[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Granular synthesis and auditory segmentation



Richard,
  I'm not sure that I see how an MAA of 1-2 degrees (the 10
microsecond resolution of the binaural system) is crucial to
explaining the Cocktail Party Effect.  Has anyone ever done any
cocktail party experiments with talkers separated by as little as 1
degree?
Bob Bolia.


Robert S. Bolia
Research Scientist
Veridian
Air Force Research Laboratory
Wright-Paterson Air Force Base, Ohio

>>> "Richard J. Fabbri" <fabbri@NETAXIS.COM> 10/15 5:22 AM >>>
        ... Ah, The Place Theory!
        ... If you truly believe in Fourier Analysis then you also
believe
        in the inverse transform.  However, the local, resonant
response
        of a stretched membrane (the Place Theory) is only useful for
a
        sinusoidal drive.  Speech presents complex structures in time
        and, a resonant response at any GIVEN time (at a PLACE) on the
        Basilar membrane is NOT the same thing as a FULL spectral
        analysis which produces amplitude and PHASE information at
        MANY "frequencies" such that an inverse transform is possible.
        ... It is also a well known fact that (Binaural) localization
has a
        10microsecond resolution (1 to 2 spatial degrees) and, that
this
        resolution is crucial to explaining the Cocktail Party Effect.
        ... 10microsecond resolution implies a Fourier sample window
        of approx the same size --- which implies a "spectral"
resolution
        of 100"KHz", i.e., quite useless if acoustic analysis is the
goal.


>the "textural aspect" (i.e., the patterning)
>in sound textures perceptually relatively invariant to their
>position in the time-frequency plane with a typical [0s, 1s]
>by [500 Hz, 5 kHz] area. That is also what I would want, since
>it keeps the perceptual qualities of overall time-frequency
>"position" and sound "pattern" largely independent, just as
>in vision the texture of an object doesn't appear to change
>and interfere with position of that object in the visual field.
>
>Of course the "art" is to optimize this preservation of
>invariants in the cross-modal mapping, while maximizing
>resolution and ease of perception (including "proper"
>grouping and segregation, possibly by manipulating the
>sound textures).
        ... I also work with graphical speech patterns.
        ... But, my structures are time-locked to SOURCES in the
        acoustic environment.
        ... And, Self-Organized Neural Maps detect and classify
        these structures.
        ... You must solve the SOURCE localization problem before
        ANY (source) analysis is EVER attempted.

        ... I wish to make this point emphatically - one can ONLY
        analyze a SOURCE and, before one CAN do SOURCE analysis,
        one MUST isolate the information from THAT source.

        ... This is precisely what occurs during the Cocktail Party
Effect,
        i.e., a SOURCE is isolated and analysis is focused on the
        information produced by THAT source.
        ... Spectral analysis of a Acoustic point in space is
essentially
        useless since the resultant frequencies to be ASSOCIATED with
        a PARTICULAR SOURCE remains unknown!
        ... But, this result is to be expected as the typical FFT
sample
        window of 10 milliseconds is 1,000 times larger than the raw,
        human, localization resolution of 10 microseconds, i.e., the 1
to 2
        Spatial Degrees of SOURCE resolution are completely buried -
        actually, it's more appropriate to say that Spatial Resolution
has
        been lost in the 10 millisecond AVERAGING process used to
        calculate "spectral" components.


Rich Fabbri

McGill is running a new version of LISTSERV (1.8d on Windows NT).
Information is available on the WEB at
http://www.mcgill.ca/cc/listserv

McGill is running a new version of LISTSERV (1.8d on Windows NT). 
Information is available on the WEB at http://www.mcgill.ca/cc/listserv