[AUDITORY] How we talk to machines that listen (Valeriy Shafiro )


Subject: [AUDITORY] How we talk to machines that listen
From:    Valeriy Shafiro  <firosha@xxxxxxxx>
Date:    Sun, 3 Feb 2019 11:16:35 -0600
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--000000000000b784d40581008ade Content-Type: text/plain; charset="UTF-8" Dear list, I am wondering if any one has any references or suggestions about this question. These days I hear more and more people talking to machines, e.g. Siri, Google, Alexa, etc., and doing it in more and more places. Automatic speech recognition has improved tremendously, but still it seems to me that when people talk to machines they often switch into a different production mode. At times it may sound like talking to a (large) dog and sometimes like talking to a customer service agent in a land far away who is diligently trying to follow a script rather than listen to what you are saying. And I wonder whether adjustments that people make in their speech production when talking with machines in that mode are in fact optimal for improving recognition accuracy. Since machines are not processing speech in the same way as humans, I wonder if changes in speech production that make speech more recognizable for other people (or even pets) are always the same as they are for machines. In other words, do people tend to make the most optimal adjustments to make their speech more recognizable to machines. Or is it more like falling back on clear speech modes that work with other kinds of listeners (children, nonnative speakers, pets), or something in between? I realize there is a lot to this question, but perhaps people have started looking into it. I am happy to collate references and replies and send to the list. Best, Valeriy --000000000000b784d40581008ade Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;color:r= gb(33,33,33);line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-= serif">Dear list,</p><p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;lin= e-height:15.6933px"><font color=3D"#212121" face=3D"Calibri, sans-serif"><s= pan style=3D"font-size:11pt">I am wondering if any one has any references o= r suggestions about this question.=C2=A0 These days I hear more and more pe= ople=C2=A0</span></font><span style=3D"color:rgb(33,33,33);font-family:Cali= bri,sans-serif;font-size:11pt">talking to machines, e.g. Siri, Google, Alex= a, etc., and doing it=C2=A0in more and more places.=C2=A0 Automatic speech = recognition has improved tremendously, but still it seems to me that when p= eople talk to machines they often switch into a different production mode.= =C2=A0 =C2=A0At times it may=C2=A0</span><span style=3D"color:rgb(33,33,33)= ;font-family:Calibri,sans-serif;font-size:11pt">sound like=C2=A0talking to = a (large) dog=C2=A0 and sometimes like talking to a=C2=A0customer service a= gent in a land far away who is diligently trying to=C2=A0follow=C2=A0a scri= pt rather than listen to what you are saying.=C2=A0=C2=A0=C2=A0And I=C2=A0w= onder whether adjustments that people make in their speech production when = talking with machines in that mode are in fact optimal for=C2=A0improving= =C2=A0recognition=C2=A0</span><font color=3D"#212121" face=3D"Calibri, sans= -serif"><span style=3D"font-size:11pt">accuracy.=C2=A0 Since machines are n= ot processing speech in the same way as humans, I wonder if changes in spee= ch production that make speech more </span><span style=3D"font-size:14.6667= px">recognizable</span><span style=3D"font-size:11pt">=C2=A0for other peopl= e (or even pets) are always the same as they are for machines.=C2=A0 In oth= er words, do people tend to make the most optimal adjustments to make their= speech more recognizable to machines.=C2=A0 Or is it more like falling bac= k on clear speech modes that work with other kinds of listeners (children, = nonnative speakers, pets), or something in between?=C2=A0=C2=A0</span></fon= t></p><span class=3D"gmail-im" style=3D"color:rgb(80,0,80);font-family:Cali= bri,Arial,Helvetica,sans-serif;font-size:16px"><p class=3D"MsoNormal" style= =3D"margin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Cal= ibri,sans-serif">I realize there is a lot to this question, but perhaps peo= ple have started looking into it.=C2=A0 I am happy to collate references an= d replies and send to the list.=C2=A0</p><p class=3D"MsoNormal" style=3D"ma= rgin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,s= ans-serif">Best,</p><p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line= -height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif">=C2=A0</p>= <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px;fo= nt-size:11pt;font-family:Calibri,sans-serif">Valeriy</p></span></div> --000000000000b784d40581008ade--


This message came from the mail archive
src/postings/2019/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University