Re: [AUDITORY] How we talk to machines that listen (Phil Green )


Subject: Re: [AUDITORY] How we talk to machines that listen
From:    Phil Green  <p.green@xxxxxxxx>
Date:    Tue, 5 Feb 2019 19:54:52 +0000
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

This is a multi-part message in MIME format. --------------DEDE0D30B7F286038180FF87 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum2.it.mcgill.ca id x15JstEt017902 On 03/02/2019 17:16, Valeriy Shafiro wrote: > > Dear list, > > I am wondering if any one has any references or suggestions about this=20 > question.=C2=A0 These days I hear more and more people talking to machi= nes,=20 > e.g. Siri, Google, Alexa, etc., and doing it=C2=A0in more and more plac= es.=C2=A0=20 > Automatic speech recognition has improved tremendously, but still it=20 > seems to me that when people talk to machines they often switch into a=20 > different production mode.=C2=A0 =C2=A0At times it may sound like=C2=A0= talking to a=20 > (large) dog=C2=A0 and sometimes like talking to a=C2=A0customer service= agent in=20 > a land far away who is diligently trying to=C2=A0follow=C2=A0a script r= ather=20 > than listen to what you are saying.=C2=A0=C2=A0=C2=A0And I=C2=A0wonder = whether adjustments=20 > that people make in their speech production when talking with machines=20 > in that mode are in fact optimal for=C2=A0improving=C2=A0recognition ac= curacy.=20 > Since machines are not processing speech in the same way as humans, I=20 > wonder if changes in speech production that make speech more=20 > recognizable=C2=A0for other people (or even pets) are always the same a= s=20 > they are for machines.=C2=A0 In other words, do people tend to make the= =20 > most optimal adjustments to make their speech more recognizable to=20 > machines.=C2=A0 Or is it more like falling back on clear speech modes t= hat=20 > work with other kinds of listeners (children, nonnative speakers,=20 > pets), or something in between? > > I realize there is a lot to this question, but perhaps people have=20 > started looking into it.=C2=A0 I am happy to collate references and rep= lies=20 > and send to the list. > > Best, > > Valeriy > Here's a forthcoming workshop on this topic.. http://speech-interaction.org/chi2019/ phil --=20 *** note email is now p.green@xxxxxxxx *** Professor Phil Green SPandH Dept of Computer Science University of Sheffield *** note email is now p.green@xxxxxxxx *** --------------DEDE0D30B7F286038180FF87 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum2.it.mcgill.ca id x15JstEt017902 <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF= -8"> </head> <body text=3D"#000000" bgcolor=3D"#FFFFFF"> <p><br> </p> <div class=3D"moz-cite-prefix">On 03/02/2019 17:16, Valeriy Shafiro wrote:<br> </div> <blockquote type=3D"cite" cite=3D"mid:10365_1549256982_5C57C916_10365_62_1_CAP907Y1+armH8BJ4EEjTJML= ngeBo+=3DtUwYtUM2m1120wWU7HTA@xxxxxxxx"> <meta http-equiv=3D"content-type" content=3D"text/html; charset=3DU= TF-8"> <div dir=3D"ltr"> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;color:rgb(33,33,33);line-height:15.6933px;font-size:11pt;font-family:= Calibri,sans-serif">Dear list,</p> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px"><font face=3D"Calibri, sans-serif" color=3D"#212121"><span style=3D"font-size:11pt">I am wonderi= ng if any one has any references or suggestions about this question.=C2=A0 These days I hear more and more people=C2=A0= </span></font><span style=3D"color:rgb(33,33,33);font-family:Calibri,sans-serif;font-size:11p= t">talking to machines, e.g. Siri, Google, Alexa, etc., and doing it=C2=A0= in more and more places.=C2=A0 Automatic speech recognition has improved tremendously, but still it seems to me that when people talk to machines they often switch into a different production mode.=C2=A0 =C2=A0At times it may=C2=A0</span><spa= n style=3D"color:rgb(33,33,33);font-family:Calibri,sans-serif;font-size:11p= t">sound like=C2=A0talking to a (large) dog=C2=A0 and sometimes like t= alking to a=C2=A0customer service agent in a land far away who is diligently trying to=C2=A0follow=C2=A0a script rather than li= sten to what you are saying.=C2=A0=C2=A0=C2=A0And I=C2=A0wonder wheth= er adjustments that people make in their speech production when talking with machines in that mode are in fact optimal for=C2=A0improving=C2=A0recognition=C2=A0</span><font face=3D= "Calibri, sans-serif" color=3D"#212121"><span style=3D"font-size:11pt">= accuracy.=C2=A0 Since machines are not processing speech in the same way as humans, I wonder if changes in speech production that make speech more </span><span style=3D"font-size:14.6667px"= >recognizable</span><span style=3D"font-size:11pt">=C2=A0for other people (or even pe= ts) are always the same as they are for machines.=C2=A0 In othe= r words, do people tend to make the most optimal adjustments to make their speech more recognizable to machines.=C2=A0 O= r is it more like falling back on clear speech modes that work with other kinds of listeners (children, nonnative speakers, pets), or something in between?=C2=A0=C2=A0</span= ></font></p> <span class=3D"gmail-im" style=3D"color:rgb(80,0,80);font-family:Calibri,Arial,Helvetica,sans-seri= f;font-size:16px"> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,= sans-serif">I realize there is a lot to this question, but perhaps people have started looking into it.=C2=A0 I am happy to collate references and replies and send to the list.=C2=A0</p> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,= sans-serif">Best,</p> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,= sans-serif">=C2=A0</p> <p class=3D"MsoNormal" style=3D"margin:0in 0in 8pt;line-height:15.6933px;font-size:11pt;font-family:Calibri,= sans-serif">Valeriy</p> </span></div> </blockquote> <p>Here's a forthcoming workshop on this topic.. <br> </p> <p><a class=3D"moz-txt-link-freetext" href=3D"http://speech-interacti= on.org/chi2019/">http://speech-interaction.org/chi2019/</a></p> <p>phil<br> </p> <pre class=3D"moz-signature" cols=3D"72">--=20 *** note email is now <a class=3D"moz-txt-link-abbreviated" href=3D"mailt= o:p.green@xxxxxxxx">p.green@xxxxxxxx</a> *** Professor Phil Green SPandH Dept of Computer Science University of Sheffield *** note email is now <a class=3D"moz-txt-link-abbreviated" href=3D"mailt= o:p.green@xxxxxxxx">p.green@xxxxxxxx</a> *** </pre> </body> </html> --------------DEDE0D30B7F286038180FF87--


This message came from the mail archive
src/postings/2019/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University