Subject: Re: [AUDITORY] Semantic McGurk Effect From: "Arthur, Claire" <claire.arthur@xxxxxxxx> Date: Fri, 7 Aug 2020 19:14:49 +0000--_000_DM6PR07MB6185D623A7EE025A6BA7BE998A490DM6PR07MB6185namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I believe because of the odd timbre/background noise, our text priming help= s us choose what to parse as the "negative space". So, for example, you hea= r the "ee" from "needle" if you are trying to hear that word, whereas you s= imply ignore that "ee" as it falls in between "brain" and "storm" if you ar= e listening for brainstorm. Fun! Thanks for sharing. Claire Claire Arthur Assistant Professor, School of Music College of Design Georgia Institute of Technology (404) 894-9110 claire.arthur@xxxxxxxx ________________________________ From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxx>= on behalf of Prof. Roger K. Moore <0000011559506d60-dmarc-request@xxxxxxxx= GILL.CA> Sent: Friday, August 7, 2020 4:38 AM To: AUDITORY@xxxxxxxx <AUDITORY@xxxxxxxx> Subject: Re: [AUDITORY] Semantic McGurk Effect I must admit to being surprised by the surprise engendered by this video. = Anyone who was around during the early days of text-to-speech synthesis is = very aware of the danger of presenting the text in advance of or simultaneo= us with the generated speech. The intelligibility of the resulting synthes= is could be zero without the 'prior' and 100% with the visual cue. So, given that we know that perception involves the integration of top-down= expectations with bottom-up evidence (going right back to Richard Warren's= work on the 'phoneme restoration effect'), why is this TikTok demo surpris= ing? Or maybe I'm missing something? Best wishes Roger ---------------------------------------------------------------------------= ----------------- Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET Chair of Spoken Language Processing Vocal Interactivity Lab (VILab), Sheffield Robotics Speech & Hearing Research Group (SPandH) Department of Computer Science, UNIVERSITY OF SHEFFIELD Regent Court, 211 Portobello, Sheffield, S1 4DP, UK * Winner of the 2016 Antonio Zampolli Prize for "Outstanding Contributions to the Advancement of Language Resources & Language Technology Evaluation within Human Language Technologies" e-mail: r.k.moore@xxxxxxxx<mailto:r.k.moore@xxxxxxxx> web: http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/ twitter: @xxxxxxxx Tel: +44 (0) 11422 21807 Fax: +44 (0) 11422 21810 Mob: +44 (0) 7910 073631 Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE (http://www.journals.elsevier.com/computer-speech-and-language/) ---------------------------------------------------------------------------= ----------------- <http:///> On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney <malcolm@xxxxxxxx<mailto:malcol= m@xxxxxxxx>> wrote: Has there been anything formal published on this effect? https://www.iflscience.com/brain/what-the-hell-is-going-on-in-this-tikto= k-audio-illusion It sounds to me like a semantic version of the McGurk effect. Nice demo. - Malcolm --_000_DM6PR07MB6185D623A7EE025A6BA7BE998A490DM6PR07MB6185namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"= > <style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo= ttom:0;} </style> </head> <body dir=3D"ltr"> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt; color: rgb(0, 0, 0);"> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt"> I believe because of the odd timbre/background noise, our text priming help= s us choose what to parse as the "negative space". So, for exampl= e, you hear the "ee" from "needle" if you are trying to= hear that word, whereas you simply ignore that "ee" as it falls in between "brain" and "storm" if you are listening fo= r brainstorm. Fun! Thanks for sharing.</div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt"> <br> </div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt"> Claire</div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt"> <br> </div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt"> <br> </div> <br> <div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt; color: rgb(0, 0, 0);"> <br> </div> <div id=3D"Signature"> <div> <div></div> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color= :#000000; font-family:Calibri,Helvetica,sans-serif"> <p style=3D"margin-top:0px; margin-bottom:0px; margin-top:0; margin-bottom:= 0"><font size=3D"2"><span style=3D"font-size:11pt"><span style=3D"font-size= :10pt">Claire Arthur</span><br> <span style=3D"font-size:10pt">Assistant Professor, School of Music</span><= br> <span style=3D"font-size:10pt">College of Design</span><br> <span style=3D"font-size:10pt">Georgia Institute of Technology</span><br> <span style=3D"font-size:10pt">(404) 894-9110</span><br> <span style=3D"font-size:10pt">claire.arthur@xxxxxxxx</span></span></font= ></p> <p style=3D"margin-top:0px; margin-bottom:0px; margin-top:0; margin-bottom:= 0"><br> </p> </div> </div> </div> </div> </div> <div id=3D"appendonsend"></div> <hr style=3D"display:inline-block;width:98%" tabindex=3D"-1"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" st= yle=3D"font-size:11pt" color=3D"#000000"><b>From:</b> AUDITORY - Research i= n Auditory Perception <AUDITORY@xxxxxxxx> on behalf of Prof. R= oger K. Moore <0000011559506d60-dmarc-request@xxxxxxxx><br> <b>Sent:</b> Friday, August 7, 2020 4:38 AM<br> <b>To:</b> AUDITORY@xxxxxxxx <AUDITORY@xxxxxxxx><br> <b>Subject:</b> Re: [AUDITORY] Semantic McGurk Effect</font> <div> </div> </div> <div> <div dir=3D"ltr">I must admit to being surprised by the surprise engendered= by this video. Anyone who was around during the early days of text-t= o-speech synthesis is very aware of the danger of presenting the text = in advance of or simultaneous with the generated speech. The intelligibility of the resulting synthesis could be= zero without the 'prior' and 100% with the visual cue. <div><br> </div> <div>So, given that we know that perception involves the integration of top= -down expectations with bottom-up evidence (going right back to Richard War= ren's work on the 'phoneme restoration effect'), why is this TikTok demo su= rprising? Or maybe I'm missing something? <div><br> </div> <div>Best wishes</div> <div>Roger</div> <div><br clear=3D"all"> <div> <div dir=3D"ltr" class=3D"x_gmail_signature"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"><font size=3D"1">-----------------------------------------= ---------------------------------------------------<br> Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET<br> <br> Chair of Spoken Language Processing<br> Vocal Interactivity Lab (VILab), Sheffield Robotics<br> Speech & Hearing Research Group (SPandH)<br> Department of Computer Science, UNIVERSITY OF SHEFFIELD<br> Regent Court, 211 Portobello, Sheffield, S1 4DP, UK</font> <div><font size=3D"1" face=3D"arial, helvetica, sans-serif"><br> </font></div> <div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1">* Winner of&nbs= p;the 2016 Antonio Zampolli Prize for "<i>Outstanding Contributions&nb= sp;</i></font></div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>to the Advan= cement of Language Resources & Language Technology </i></font></di= v> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>Evaluation w= ithin Human Language Technologies</i>"</font></div> <font size=3D"1"><br> e-mail: <a href=3D"mailto:r.k.moore@xxxxxxxx" target=3D"_blank= ">r.k.moore@xxxxxxxx</a><br> web: <a href=3D"http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/" targ= et=3D"_blank">http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/</a></font></= div> <div><font size=3D"1">twitter: @xxxxxxxx<br> Tel: +44 (0) 11422 21807<br> Fax: +44 (0) 11422 21810<br> Mob: +44 (0) 7910 073631<br> <br> Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE<br> (<a href=3D"http://www.journals.elsevier.com/computer-speech-and-language/"= target=3D"_blank">http://www.journals.elsevier.com/computer-speech-and-lan= guage/</a>)</font></div> <div><span style=3D"font-size:x-small">------------------------------------= --------------------------------------------------------</span><br> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <a href=3D"http:///" target=3D"_blank"></a></div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <br> </div> </div> </div> <br> <div class=3D"x_gmail_quote"> <div dir=3D"ltr" class=3D"x_gmail_attr">On Fri, 7 Aug 2020 at 05:12, Malcol= m Slaney <<a href=3D"mailto:malcolm@xxxxxxxx">malcolm@xxxxxxxx</a>> w= rote:<br> </div> <blockquote class=3D"x_gmail_quote" style=3D"margin:0px 0px 0px 0.8ex; bord= er-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,2= 04); padding-left:1ex"> <div style=3D"word-wrap:break-word; line-break:after-white-space">Has there= been anything formal published on this effect? <div> <a href=3D"https://www.iflscience.com/brain/what-the-hell= -is-going-on-in-this-tiktok-audio-illusion" target=3D"_blank" style=3D"colo= r:rgb(17,85,204); font-family:Arial,Helvetica,sans-serif; font-size:small; = font-variant-ligatures:normal; background-color:rgb(255,255,255)">https://w= ww.iflscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illu= sion</a></div> <div><br> </div> <div>It sounds to me like a semantic version of the McGurk effect.</div> <div><br> </div> <div>Nice demo.</div> <div><br> </div> <div>- Malcolm</div> <div><br> </div> </div> </blockquote> </div> </div> </body> </html> --_000_DM6PR07MB6185D623A7EE025A6BA7BE998A490DM6PR07MB6185namp_--