Tech report on speech segregation (Guoning Hu )


Subject: Tech report on speech segregation
From:    Guoning Hu  <hu(at)CIS.OHIO-STATE.EDU>
Date:    Mon, 15 Apr 2002 21:33:38 -0400

This is a multi-part message in MIME format. ------=_NextPart_000_000F_01C1E4C5.34A4CB70 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Dear Auditory list, It's my pleasure to announce the following technical report available via WWW. Thanks for your attention, Guoning Hu *************************************************************** "Monaural speech segregation based on pitch tracking and amplitude modulation" Technical Report #6, March 2002 Department of Computer and Information Science The Ohio State University *************************************************************** Guoning Hu, The Ohio State University DeLiang Wang, The Ohio State University Speech segregation in the monaural condition has proven to be very challenging. Monaural speech segregation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the high-frequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a system that deals with resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to global pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range. For WWW: http://www.cis.ohio-state.edu/~hu/Publication/TR6.pdf Related sound demos can be found at: http://www.cis.ohio-state.edu/~hu/Publication/MSSDemo.htm Preliminary versions (in pdf) of this work are included in 2001 IEEE WASPAA and 2002 IEEE ICASSP. ------=_NextPart_000_000F_01C1E4C5.34A4CB70 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD><TITLE></TITLE> <META http-equiv=3DContent-Type content=3D"text/html; = charset=3Diso-8859-1"> <META content=3D"MSHTML 6.00.2600.0" name=3DGENERATOR></HEAD> <BODY> <P><FONT size=3D2>Dear Auditory list,<BR><BR>It's my pleasure to = announce the=20 following technical report<BR>available via WWW.<BR><BR>Thanks for your=20 attention,<BR><BR>Guoning=20 Hu<BR><BR><BR>***********************************************************= ****<BR>"Monaural=20 speech segregation based on pitch tracking and=20 amplitude<BR>modulation"<BR><BR>Technical Report #6, March=20 2002<BR><BR>Department of Computer and Information Science<BR>The Ohio = State=20 University<BR>***********************************************************= ****<BR><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20 Guoning Hu, The Ohio State=20 University<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DeLiang Wang, = The Ohio=20 State University<BR><BR>Speech segregation in the monaural condition has = proven=20 to be very<BR>challenging. Monaural speech segregation has been studied = in=20 previous<BR>systems that incorporate auditory scene analysis principles. = A major=20 problem<BR>for these systems is their inability to deal with speech in=20 the<BR>high-frequency range. Psychoacoustic evidence suggests that=20 different<BR>perceptual mechanisms are involved in handling resolved and = unresolved<BR>harmonics. We propose a system that deals with resolved = and=20 unresolved<BR>harmonics differently. For resolved harmonics, the system=20 generates segments<BR>based on temporal continuity and cross-channel=20 correlation, and groups them<BR>according to their periodicities. For = unresolved=20 harmonics, it generates<BR>segments based on common amplitude modulation = (AM) in=20 addition to temporal<BR>continuity and groups them according to AM = repetition=20 rates derived from<BR>sinusoidal modeling. Underlying the segregation = process is=20 a pitch contour<BR>that is first estimated from speech segregated = according to=20 global pitch and<BR>then adjusted according to psychoacoustic = constraints. Our=20 system is<BR>systematically evaluated, and it yields substantially = better=20 performance<BR>than previous systems, especially in the high-frequency=20 range.<BR><BR>For WWW:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = </FONT><A=20 href=3D"http://www.cis.ohio-state.edu/~hu/Publication/TR6.pdf" = target=3D_blank><FONT=20 size=3D2>http://www.cis.ohio-state.edu/~hu/Publication/TR6.pdf</FONT></A>= <BR><BR><FONT=20 size=3D2>Related sound demos can be found=20 at:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </FONT><A=20 href=3D"http://www.cis.ohio-state.edu/~hu/Publication/MSSDemo.htm"=20 target=3D_blank><FONT=20 size=3D2>http://www.cis.ohio-state.edu/~hu/Publication/MSSDemo.htm</FONT>= </A><BR><BR><FONT=20 size=3D2>Preliminary versions (in pdf) of this work are included = in<BR></FONT><A=20 href=3D"http://www.cis.ohio-state.edu/~hu/Publication/waspaa01.pdf"><FONT= =20 size=3D2>2001 IEEE WASPAA</FONT></A>&nbsp;<FONT size=3D2>and </FONT><A=20 href=3D"http://www.cis.ohio-state.edu/~hu/Publication/icassp02.pdf"><FONT= =20 size=3D2>2002 IEEE ICASSP</FONT></A><FONT = size=3D2>.</FONT></P></BODY></HTML> ------=_NextPart_000_000F_01C1E4C5.34A4CB70--


This message came from the mail archive
http://www.auditory.org/postings/2002/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University