Importance of "phase" in sound recognition (John Bates )

Subject: Importance of "phase" in sound recognition From: John Bates <jkbates@xxxxxxxx> Date: Sun, 10 Oct 2010 17:35:53 -0400 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY> This is a multi-part message in MIME format. ------=_NextPart_000_003A_01CB68A1.96561B80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Emad, =20 Here's something else to consider for your research. =20 Traditionally, it has been dogma that the cochlea responds only to a = sound's amplitude spectrum; therefore we should not hear changes caused = by varying phase. Yet it has been shown repeatedly that we do hear = changes in sounds as their phase spectrum is varied. How can this be? =20 Let's look at the problem: In terms of spectral analysis, we find that = as we vary the phase the amplitude spectrum is invariant. Therefore, we = conclude that the perceived changes in the sound are associated with = changes in the phase spectrum. Somehow, the ear must be responding to a = supposedly irrelevant phase spectrum. But where is the evidence? =20 Here's an idea: If we look at the signal's waveform, we notice that its = pattern also varies in accord with the phase variations. Thus, it would = appear that in lieu of a phase analyzer, the ear "reads" waveforms. As = absurd as this might seem, how else could the sound changes be heard? We = are thus convinced that the cochlea must be processing a phase/waveform = source. Now we ask, "What is the most available and usable expression of = waveform?" =20 Spatial patterns can be described in terms of their inflection points, = in our case, having time-space locations identified by sequences of real = and complex zeros, readily obtained physically by finding the waveform = derivatives. (H. Voelker and A. Requicha) By using delay lines to = preserve past events for present use (the cochlea?), meaningful temporal = patterns in the stream of zeros (pitch?) can be recognized. Information = such as amplitude and direction of arrival can be associated with = patterns of events that are referenced to the zeros. In simple terms; = the ear processes sound in the time domain, not the frequency domain. = The trick is to find out how the ear does these things. And keep in mind = that they are done in real time and are synchronized with the signal = waveform. =20 So, there you are: The most likely answer for you, that I can see, is = that the cochlea and its various parts must derive meaningful = information from signal waveforms by recognizing patterns in the = temporal sequences of their zeros. =20 John Bates From: emad burke=20 To: AUDITORY@xxxxxxxx=20 Sent: Tuesday, October 05, 2010 11:23 AM Subject: About importance of "phase" in sound recognition Dear List, I've been confused about the role of "phase" information of the sound = (eg speech) signal in speech recognition and more generally human's = perception of audio signals. I've been reading conflicting arguments and = publications regarding the extent of importance of phase information. if = there is a border between short and long-term phase information that = clarifies this extent of importance, can anybody please introduce me any = convincing reference in that respect. In summary I just want to know = what is the consensus in the community about phase role in speech = recognition, of course if there is any at all. Best Emad ------=_NextPart_000_003A_01CB68A1.96561B80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content=3D"text/html; charset=3Diso-8859-1" = http-equiv=3DContent-Type> <META name=3DGENERATOR content=3D"MSHTML 8.00.6001.18943"> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV> <DIV> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Emad,<?xml:namespace=20 prefix =3D o ns =3D "urn:schemas-microsoft-com:office:office"=20 /><o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Here's=20 something else to consider for your research.<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Traditionally,=20 it has been dogma that the cochlea responds only to a sound=92s = amplitude=20 spectrum; therefore we should not hear changes caused = by varying=20 phase. Yet it has been shown repeatedly that we do hear = changes in=20 sounds as their phase spectrum is varied. How can this = be?<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Let's=20 look at the problem: In terms of spectral analysis, we find that as = we vary=20 the phase the amplitude spectrum is invariant. Therefore, we = conclude that=20 the perceived changes in the sound are associated with changes in the = phase=20 spectrum. Somehow, the ear must be responding to a supposedly irrelevant = phase=20 spectrum. But where is the evidence?<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Here=92s=20 an idea: If we look at the signal's waveform, we notice that its pattern = also=20 varies in accord with the phase variations. Thus, it would appear that = in lieu=20 of a phase analyzer, the ear "reads" waveforms. As absurd as this might = seem,=20 how else could the sound changes be heard? We are thus convinced that = the=20 cochlea must be processing a phase/waveform source. Now we=20 ask, =93What is the most available and usable expression of=20 waveform?=94<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">Spatial=20 patterns can be described in terms of their inflection points, in our = case,=20 having time-space locations identified by sequences of real and = complex=20 zeros, readily obtained physically by finding the waveform = derivatives. (H.=20 Voelker and A. Requicha) By using delay lines to preserve past events = for=20 present use (the cochlea?), meaningful temporal patterns in the = stream of=20 zeros (pitch?) can be recognized. Information such as = amplitude and=20 direction of arrival can be associated with patterns of events that are=20 referenced to the zeros. In simple terms; the ear processes sound = in the=20 time domain, not the frequency domain. The trick is to find out how the = ear does=20 these things. And keep in mind that they are done in real time and are=20 synchronized with the signal waveform.<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">So,=20 there you are: The most likely answer for you, that I can see, is = that the=20 cochlea and its various parts must derive meaningful information=20 from signal waveforms by recognizing patterns in the temporal = sequences=20 of their zeros.<o:p></o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'"><o:p> </o:p></SPAN></P> <P style=3D"LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt 0.5in" = class=3DMsoNormal><SPAN=20 style=3D"FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; = mso-fareast-font-family: 'Times New Roman'">John=20 Bates<o:p></o:p></SPAN></P></DIV> <DIV><B></B> </DIV> <DIV><B></B> </DIV> <DIV><B>From:</B> <A title=3Demad.burke@xxxxxxxx=20 href=3D"mhtml:{C0419D4E-512D-4428-81F7-17E32FD0A3FE}mid://00000031/!x-usc= :mailto:emad.burke@xxxxxxxx">emad=20 burke</A> </DIV> <BLOCKQUOTE=20 style=3D"BORDER-LEFT: #000000 2px solid; PADDING-LEFT: 5px; = PADDING-RIGHT: 0px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"> <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A = title=3DAUDITORY@xxxxxxxx=20 = href=3D"mhtml:{C0419D4E-512D-4428-81F7-17E32FD0A3FE}mid://00000031/!x-usc= :mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx</A>=20 </DIV> <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Tuesday, October 05, 2010 = 11:23=20 AM</DIV> <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> About importance of = "phase" in=20 sound recognition</DIV> <DIV><FONT size=3D2 face=3DArial></FONT><BR></DIV>Dear = List,<BR><BR>I've been=20 confused about the role of "phase" information of the sound (eg = speech) signal=20 in speech recognition and more generally human's perception of audio = signals.=20 I've been reading conflicting arguments and publications regarding the = extent=20 of importance of phase information. if there is a border between short = and=20 long-term phase information that clarifies this extent of importance, = can=20 anybody please introduce me any convincing reference in that respect. = In=20 summary I just want to know what is the consensus in the community = about phase=20 role in speech recognition, of course if there is any at=20 all.<BR><BR>Best<BR>Emad</BLOCKQUOTE></DIV></BODY></HTML> ------=_NextPart_000_003A_01CB68A1.96561B80--

This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University