Re: About importance of "phase" in sound recognition (Joachim Thiemann )


Subject: Re: About importance of "phase" in sound recognition
From:    Joachim Thiemann  <joachim.thiemann@xxxxxxxx>
Date:    Tue, 5 Oct 2010 13:41:34 -0400
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Hi Emad, like Kevin, I have to ask: what phase do you mean? I have been working for a while now (just finishing my Ph.D.) on phase in the context of monaural phase, and phase perception. If we're talking about the phase that you get from a Short-Time Fourier Transform (STFT) transform, it has IMHO very little perceptual meaning (lots of information, but not easily translatable to 'meaning') since you're basically converting a short section of sound by first windowing it, multiplying it with a perfect monochromatic complex sinusoid, and then compare the rotation of that to the sound - this has no easy equivalent perceptually. It is a mathematical decomposition of the signal and very useful as such, and can with some effort be analysed from a perceptual viewpoint - but mostly only by returning it into time domain. The STFT is a block-based transform; our ears don't work on short-time blocks. There are various time-constants of perceptual effects, but no fixed boundaries. I have been using a gammatone magnitude/phase decomposition. I use a gammatone filterbank (FB) at some spacing based on the ERB/Bark scale with a view towards reconstruction (see for example Strahl in JASA Nov. 2009). The envelope of the signals in each channel of this FB can be regarded as the strength of excitation of a hair cell (HC) ensemble at a point on the Basilar Membrane (BM). Normalising w.r.t. the envelope you are left with a nearly sinusoidal "carrier" signal whose frequency is centered around the channels center frequency, which is the critical frequency of the hair cell ensemble associated with the filter. This carrier can be regarded as the instantaneous phase of the original signal in a auditory channel; we can now ask how much "phase distortion" is audible in each channel - this phase signal is synchronized with the IHC response for lower frequency auditory channels. Problem is that auditory channels overlap a lot thus you can't modify the carrier/phase of one channel independently of those in adjacent channels. I haven't looked at this carrier/phase with regards to binaural hearing yet, but since this is an essentially time-domain phase/magnitude decomposition, it should be amenable to be examined for inter-aural time differences, and the magnitude signal for inter-aural level differences. Joe. On Tue, Oct 5, 2010 at 11:23, emad burke <emad.burke@xxxxxxxx> wrote: > Dear List, > > I've been confused about the role of "phase" information of the sound (eg > speech) signal in speech recognition and more generally human's perception > of audio signals. I've been reading conflicting arguments and publications > regarding the extent of importance of phase information. if there is a > border between short and long-term phase information that clarifies this > extent of importance, can anybody please introduce me any convincing > reference in that respect. In summary I just want to know what is the > consensus in the community about phase role in speech recognition, of course > if there is any at all. > > Best > Emad > -- Joachim Thiemann :: http://www.tsp.ece.mcgill.ca/~jthiem


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University