Re: [AUDITORY] How we talk to machines that listen (Leon van Noorden )

Subject: Re: [AUDITORY] How we talk to machines that listen From: Leon van Noorden <000000a1783dbfa4-dmarc-request@xxxxxxxx> Date: Mon, 4 Feb 2019 09:55:45 +0100 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY> --Apple-Mail=_D076719C-76D0-4151-8D21-EA64B0D5D33C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Dear Valeriy, My wife is deaf since birth and an extremely good lipreader. She notices = quite often that, when she says she is deaf, people go into an = exaggerated speech mode, that does not help her to understand them. But sorry, she is not a machine ;-)). Best,=20 Leon > On 3 Feb 2019, at 18:16, Valeriy Shafiro <firosha@xxxxxxxx> wrote: >=20 > Dear list, >=20 > I am wondering if any one has any references or suggestions about this = question. These days I hear more and more people talking to machines, = e.g. Siri, Google, Alexa, etc., and doing it in more and more places. = Automatic speech recognition has improved tremendously, but still it = seems to me that when people talk to machines they often switch into a = different production mode. At times it may sound like talking to a = (large) dog and sometimes like talking to a customer service agent in a = land far away who is diligently trying to follow a script rather than = listen to what you are saying. And I wonder whether adjustments that = people make in their speech production when talking with machines in = that mode are in fact optimal for improving recognition accuracy. Since = machines are not processing speech in the same way as humans, I wonder = if changes in speech production that make speech more recognizable for = other people (or even pets) are always the same as they are for = machines. In other words, do people tend to make the most optimal = adjustments to make their speech more recognizable to machines. Or is = it more like falling back on clear speech modes that work with other = kinds of listeners (children, nonnative speakers, pets), or something in = between? =20 > I realize there is a lot to this question, but perhaps people have = started looking into it. I am happy to collate references and replies = and send to the list.=20 >=20 > Best, >=20 > =20 > Valeriy >=20 --Apple-Mail=_D076719C-76D0-4151-8D21-EA64B0D5D33C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><font= size=3D"4" class=3D"">Dear Valeriy,</font><div class=3D""><font = size=3D"4" class=3D""><br class=3D""></font></div><div class=3D""><font = size=3D"4" class=3D"">My wife is deaf since birth and an extremely good = lipreader. She notices quite often that, when she says she is deaf, = people go into an exaggerated speech mode, that does not help her to = understand them.</font></div><div class=3D""><font size=3D"4" = class=3D"">But sorry, she is not a machine ;-)).</font></div><div = class=3D""><font size=3D"4" class=3D""><br class=3D""></font></div><div = class=3D""><font size=3D"4" class=3D"">Best, </font></div><div = class=3D""><font size=3D"4" class=3D"">Leon<br class=3D""></font><div = class=3D""><font size=3D"4" class=3D""><br class=3D""></font><div><br = class=3D""><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><font size=3D"4" class=3D"">On 3 Feb 2019, at 18:16, Valeriy = Shafiro <<a href=3D"mailto:firosha@xxxxxxxx" = class=3D"">firosha@xxxxxxxx</a>> wrote:</font></div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><p class=3D"MsoNormal" style=3D"margin:0in 0in = 8pt;color:rgb(33,33,33);line-height:15.6933px;font-size:11pt;font-family:C= alibri,sans-serif">Dear list,</p><p class=3D"MsoNormal" = style=3D"margin:0in 0in 8pt;line-height:15.6933px"><font color=3D"#212121"= face=3D"Calibri, sans-serif" class=3D""><span style=3D"font-size:11pt" = class=3D"">I am wondering if any one has any references or suggestions = about this question.  These days I hear more and more = people </span></font><span = style=3D"color:rgb(33,33,33);font-family:Calibri,sans-serif;font-size:11pt= " class=3D"">talking to machines, e.g. Siri, Google, Alexa, etc., and = doing it in more and more places.  Automatic speech = recognition has improved tremendously, but still it seems to me that = when people talk to machines they often switch into a different = production mode.   At times it may </span><span = style=3D"color:rgb(33,33,33);font-family:Calibri,sans-serif;font-size:11pt= " class=3D"">sound like talking to a (large) dog  and = sometimes like talking to a customer service agent in a land far = away who is diligently trying to follow a script rather than = listen to what you are saying.   And I wonder = whether adjustments that people make in their speech production when = talking with machines in that mode are in fact optimal = for improving recognition </span><font color=3D"#212121" = face=3D"Calibri, sans-serif" class=3D""><span style=3D"font-size:11pt" = class=3D"">accuracy.  Since machines are not processing speech in = the same way as humans, I wonder if changes in speech production that = make speech more </span><span style=3D"font-size:14.6667px" = class=3D"">recognizable</span><span style=3D"font-size:11pt" = class=3D""> for other people (or even pets) are always the same as = they are for machines.  In other words, do people tend to make the = most optimal adjustments to make their speech more recognizable to = machines.  Or is it more like falling back on clear speech modes = that work with other kinds of listeners (children, nonnative speakers, = pets), or something in between?  </span></font></p><span = class=3D"gmail-im" = style=3D"color:rgb(80,0,80);font-family:Calibri,Arial,Helvetica,sans-serif= ;font-size:16px"><p class=3D"MsoNormal" style=3D"margin:0in 0in = 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif">I= realize there is a lot to this question, but perhaps people have = started looking into it.  I am happy to collate references and = replies and send to the list. </p><p class=3D"MsoNormal" = style=3D"margin:0in 0in = 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif">B= est,</p><div style=3D"margin: 0in 0in 8pt; line-height: 15.6933px; = font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""> <br = class=3D"webkit-block-placeholder"></div><p class=3D"MsoNormal" = style=3D"margin:0in 0in = 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif">V= aleriy</p></span></div> </div></blockquote></div><br class=3D""></div></div></body></html>= --Apple-Mail=_D076719C-76D0-4151-8D21-EA64B0D5D33C--

This message came from the mail archive
src/postings/2019/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University