Re: About importance of "phase" in sound recognition (Roy Patterson )


Subject: Re: About importance of "phase" in sound recognition
From:    Roy Patterson  <rdp1@xxxxxxxx>
Date:    Wed, 6 Oct 2010 10:49:39 +0100
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

This is a multi-part message in MIME format. --------------010304010802020805030403 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello Emad, I agree with earlier correspondents that much of the confusion comes from thinking of signals in Fourier terms. The ear performs a spectral analysis but the analysis is not properly represented by a windowed FFT, or spectrogram. The cochlea performs a wavelet transform which is better simulated with an auditory filterbank (e.g. Unoki at al., 2006). The output of each filter is encoded by auditory nerves that phase lock at speech frequencies. So there is no question that phase information gets into the auditory system. A summary of the early literature is presented in Patterson (1987). The paper summarizes our understanding of monaural phase perception and provides new data supporting earlier theories which basically say The auditory system *preserves *phase changes that change the envelope of the wave coming out of an individual auditory filter (*within channel changes*). Reverberation can produce this kind of change in a speech signal. The auditory system *loses *most of the phase information that defines time *delays between channels*. These global phase shifts are encountered in signal transmission. So one answer is to assess the phase changes you are concerned about by passing them through an auditory filterbank and checking to see whether there are within channel differences that MFCCs do not preserve. Subsequent experiments, like that of Gockel et al. (2002) suggest, as Laszlo Toth intuited, that phase changes that disrupt glottal pulse integrity reduce detectability in noise, and the effect is greater when the glottal pulse rate is lower. I can provide pdfs of the references below on request. Regards Roy P Patterson, R.D. (1987b). A pulse ribbon model of monaural phase perception. J. Acoust. Soc. Am., 82, 1560-1586. Unoki, M., Irino, T., Glasberg, B., Moore, B. C. J. and Patterson, R.D. (2006). "Comparison of the roex and gammachirp filters as representations of the auditory filter," /J. Acoust. Soc. Am./ *120.3* 1474-1492**. Gockel, H., **Moore, B.C.J. and Patterson, R.D. (*2002*). Asymmetry of masking between complex tones and noise: The role of temporal structure and peripheral compression. /J. Acoust. Soc. Am./ *111* 2759-2770. On 05/10/2010 16:23, emad burke wrote: > Dear List, > > I've been confused about the role of "phase" information of the sound > (eg speech) signal in speech recognition and more generally human's > perception of audio signals. I've been reading conflicting arguments > and publications regarding the extent of importance of phase > information. if there is a border between short and long-term phase > information that clarifies this extent of importance, can anybody > please introduce me any convincing reference in that respect. In > summary I just want to know what is the consensus in the community > about phase role in speech recognition, of course if there is any at all. > > Best > Emad -- Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University of Cambridge, Downing Street, Cambridge, CB2 3EG phone +44 (1223) 333819 fax +44 (1223) 333840 email: rdp1@xxxxxxxx http://www.pdn.cam.ac.uk/groups/cnbh/ http://www.AcousticScale.org --------------010304010802020805030403 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hello Emad,<br> <br> I agree with earlier correspondents that much of the confusion comes from thinking of signals in Fourier terms. The ear performs a spectral analysis but the analysis is not properly represented by a windowed FFT, or spectrogram. <br> <br> The cochlea performs a wavelet transform which is better simulated with an auditory filterbank (e.g. Unoki at al., 2006). The output of each filter is encoded by&nbsp;auditory nerves that phase lock at speech frequencies. So there is no question that phase information gets into the auditory system. A summary of the early literature is presented in Patterson (1987). The paper summarizes our understanding of monaural phase perception and provides new data supporting earlier theories which basically say <br> <br> The auditory system <b>preserves </b>phase changes that change the envelope of the wave coming out of an individual auditory filter (<b>within channel changes</b>). Reverberation can produce this kind of change in a speech signal.<br> <br> The auditory system <b>loses </b>most of the phase information that defines time <b>delays between channels</b>. These global phase shifts are encountered in signal transmission.<br> <br> So one answer is to assess the phase changes you are concerned about by passing them through an auditory filterbank and checking to see whether there are within channel differences that MFCCs do not preserve. <br> <br> Subsequent experiments, like that of Gockel et al. (2002) suggest, as Laszlo Toth intuited, that phase changes that disrupt glottal pulse integrity reduce detectability in noise, and the effect is greater when the glottal pulse rate is lower.<br> <br> I can provide pdfs of the references below on request.<br> <br> Regards Roy P <br> <br> <span style="font-size: 13.5pt; color: black;">Patterson, R.D. (1987b). A pulse ribbon model of monaural phase perception. J. <span class="SpellE">Acoust</span>. Soc. Am., 82, 1560-1586. <br> <br> </span>Unoki, M., Irino, T., Glasberg, B., Moore, B. C. J. and <span style="font-size: 13.5pt; color: black;">Patterson, R.D. </span>(2006). &#8220;Comparison of the roex and gammachirp filters as representations of the auditory filter,&#8221; <em>J. Acoust. Soc. Am.</em> <strong>120.3</strong><span style="">&nbsp; </span>1474-1492<strong><b>. <br> <br> Gockel, H., </b></strong>Moore, B.C.J. and <span style="font-size: 13.5pt; color: black;">Patterson, R.D. </span> (<strong>2002</strong>). Asymmetry of masking between complex tones and noise: The role of temporal structure and peripheral compression. <em>J. Acoust. Soc. Am.</em> <strong>111</strong> 2759-2770.<br> <br> <br> On 05/10/2010 16:23, emad burke wrote: <blockquote cite="mid:20101005160620.3C30B9D3A@xxxxxxxx" type="cite">Dear List,<br> <br> I've been confused about the role of "phase" information of the sound (eg speech) signal in speech recognition and more generally human's perception of audio signals. I've been reading conflicting arguments and publications regarding the extent of importance of phase information. if there is a border between short and long-term phase information that clarifies this extent of importance, can anybody please introduce me any convincing reference in that respect. In summary I just want to know what is the consensus in the community about phase role in speech recognition, of course if there is any at all.<br> <br> Best<br> Emad<br> </blockquote> <br> <br> <pre class="moz-signature" cols="72">-- Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University of Cambridge, Downing Street, Cambridge, CB2 3EG phone +44 (1223) 333819 fax +44 (1223) 333840 email: <a class="moz-txt-link-abbreviated" href="mailto:rdp1@xxxxxxxx">rdp1@xxxxxxxx</a> <a class="moz-txt-link-freetext" href="http://www.pdn.cam.ac.uk/groups/cnbh/">http://www.pdn.cam.ac.uk/groups/cnbh/</a> <a class="moz-txt-link-freetext" href="http://www.AcousticScale.org">http://www.AcousticScale.org</a> </pre> </body> </html> --------------010304010802020805030403--


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University