Re: About importance of "phase" in sound recognition

On Mon, Oct 11, 2010 at 12:21 PM, Roy Patterson <rdp1@xxxxxxxxx> wrote:

Those interested in the mathematical basis of phase perception might like to look at a paper by Martin Reimann that appeared in JASA a few years ago. After demonstrating that the cochlea preforms a wavelet transform rather than a windowed Fourier transform, he goes on to describe how phase operates in the wavelet representation of auditory processing.

"Invariance principles for cochlear mechanics: Hearing phases"
H. M. Reimann
Institute of Mathematics, University of Berne, Sidlerstrasse 5, CH-3012 Berne, Switzerland

A functional model of the cochlea is devised on the basis of the results from classical experiments.
The basilar membrane filter is investigated in detail. Its phase is close to linear in the region around
the peak of the amplification. On one side this has consequences for the time analysis and on the
other side this has led to a prediction on phase perception for very simple combinations of tones, a
prediction which is now confirmed by experiments. Equivariance under the dilation group permits
one to describe the model by a wavelet transform [Daubechies, Ten Lectures on Wavelets SIAM,
Philadelphia, 1992]. The wavelet is discussed in reference to the phase analysis of the basilar
membrane filter. © 2006 Acoustical Society of America. DOI: 10.1121/1.2159428

With regard to why the auditory system uses a wavelet transform rather than a windowed Fourier transform, Irino and Patterson (1997, 2002) have pointed out that the acoustic scale of the sounds produced by animals and instruments varies with the size of the animal or instrument (within family), and so the operator that provides the basis for the spectral transformation performed in the cochlea should have a scale variable to represent the variation in size. It is argued that this would help explain why auditory processing is so robust to changes in source size and why current recognitions systems based on spectrographic frontends have difficulties with changes in source size.

Regards Roy P

Reimann, H. M. (2006). “Invariance principles for cochlear mechanics: Hearing phases,” J. Acoust. Soc. Am. 119(2), 997-1004.

Irino, T. and Patterson, R.D. (2002). Segregating Information about the Size and Shape of the Vocal Tract using a Time-Domain Auditory Model: The Stabilised Wavelet-Mellin Transform. Speech Communication 36 181-203.

Irino, T. and Patterson, R.D. (1997). "A time-domain. level-dependent auditory filter: the gammachirp," J. Acoust. Soc. Am. 101, 412-419.
cc Prof. T. Irino and Prof. M. Reimann
-- 
Roy Patterson
Centre for the Neural Basis of Hearing
Department of Physiology, Development and Neuroscience
University of Cambridge, Downing Street, Cambridge, CB2 3EG
  phone   +44 (0) 1223 333819    fax 333840
  email:    rdp1@xxxxxxxxx
  http://www.pdn.cam.ac.uk/groups/cnbh/
  http://www.AcousticScale.org