[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition



I'm surprised that no-one so far in this thread has mentioned the "group
delay" of signals, either in the auditory system (as in my own rather
specialised auditory modelling work in the late 1980s) or in a more
mathematically-oriented traditional DSP form in the work of Yegnanarayana et
al.

The group delay (Yegnanarayana's "Modified Group Delay Function", or the
phase parameter in my "Reduced Auditory Representation") provides
information derived from the phase components of signals, but in a form
which visually is very similar to conventional PSD estimates. This form of
data has many advantages over conventional PSD representations (amplitude
independence, clear and relatively noise- and channel-immune representation
of formants, etc.), but it also has a down-side - e.g. when trying to
differentiate fricatives from background noise, when amplitude is a key
factor, and phase alone is not enough.

If you know how to use the phase you can get as much information out of it
as you can from the amplitude. To a first approximation, one can model human
perception solely in terms of amplitude, but there are some effects which
can only be explained if you include phase information as well. I've never
seen a successful attempt to model human perception solely from phase
information, but I suspect it may be possible.

This is hardly surprising when you consider the peripheral auditory system,
which provides phase-locked neuron firings at low frequencies, but seems to
only provide amplitude information once you get above a few kHz. It would be
very strange if the higher levels of the auditory system did not take
account of synchronisation when it occurs - just as it would be strange if
it were able to extract phase information from signals where the peripheral
auditory system did not first extract such information.

However, if you want to decide exactly what sounds "real" people can
differentiate between, I think you're fighting a losing battle. Quite apart
from the variability between individuals, and between one individual on
different occasions (before and after musical education, when healthy and
when suffering from an ear infection, etc., etc.), I don't believe you can
do so, either in terms of phase, amplitude, or both together. Conventional
phase and amplitude is based on a mathematical model which only really makes
sense for stationary signals, and as such is only applicable to highly
artificial environments. This is especially true for FFT-based analysis. To
model perception accurately, you need to create a complete model of the
whole auditory system, right the way from the cochlea up to the cerebellum.

For a more pragmatic approach, modern audio codecs can provide a good
indication of the perceptual importance of different components of signals,
but they are mostly based on very simplistic models of perception.


Steve Beet