Subject: Re: [AUDITORY] Semantic McGurk Effect From: Sam Mathias <samuel.mathias@xxxxxxxx> Date: Mon, 10 Aug 2020 15:47:07 -0400--00000000000004160c05ac8b39cc Content-Type: text/plain; charset="UTF-8" Could you provide a reference for this? On Sun, 9 Aug 2020 at 00:22, Sommers, Mitchell <msommers@xxxxxxxx> wrote: > We have some really powerful demonstrations of context effects in hearing. > If you ask people to identify the last word in a sentence such as "the > plumber fixed a drink", with the last word in noise, about 40% of young > adults and 80% of older adults will report hearing "sink". If you then ask > to rate "how sure you are that you heard the word you responded with", > older adults will give 100% confidence rating about half the time. > > > Mitchell S. Sommers > Professor of Psychological and Brain Sciences > Washington University in St. Louis > > Email: Msommers@xxxxxxxx > ------------------------------ > *From:* AUDITORY - Research in Auditory Perception < > AUDITORY@xxxxxxxx> on behalf of Julia Strand < > 00000071c2dbe20f-dmarc-request@xxxxxxxx> > *Sent:* Friday, August 7, 2020 8:42 AM > *To:* AUDITORY@xxxxxxxx <AUDITORY@xxxxxxxx> > *Subject:* Re: Semantic McGurk Effect > > > ** External Email - Caution ** > I'm always delighted when auditory phenomena spark the public's interest! > > I wouldn't call this a semantic McGurk, given that it doesn't have to be > driven by simultaneous bottom-up input from two modalities. That is, even > if nothing is written on the screen but you're just thinking "green needle" > to yourself, that's what you're likely to hear (whereas thinking "ga" while > hearing "ba" won't get you to "da" - you need the simultaneous input from > face and voice). So I'd agree with Roger that it's more akin to the phoneme > restoration effect or work like Cynthia Connine's "she ran hot water for > the p/bath," showing how expectations influence interpretation of bottom-up > input. > > I think most of US wouldn't be surprised that the same stimulus can be > perceived in different ways, but my impression is that the general public > tends to believe "what you see is what you get" and underestimates the > power of top-down influences. Same reason #TheDress was such a hit. > > When I include this in my class on speech perception, I also include this video > which shows Grover from Sesame street > <https://languagelog.ldc.upenn.edu/nll/?p=41249> saying EITHER "Yes, yes, > that sounds like an excellent idea!" OR "Yes, yes, that's a f*%#g excellent > idea!" > > Like I'm always telling my students - Speech is hard! Context helps! > > Best, > Julia > > On Fri, Aug 7, 2020 at 4:28 AM Prof. Roger K. Moore < > 0000011559506d60-dmarc-request@xxxxxxxx> wrote: > > I must admit to being surprised by the surprise engendered by this video. > Anyone who was around during the early days of text-to-speech synthesis is > very aware of the danger of presenting the text in advance of or > simultaneous with the generated speech. The intelligibility of the > resulting synthesis could be zero without the 'prior' and 100% with the > visual cue. > > So, given that we know that perception involves the integration of > top-down expectations with bottom-up evidence (going right back to Richard > Warren's work on the 'phoneme restoration effect'), why is this TikTok demo > surprising? Or maybe I'm missing something? > > Best wishes > Roger > > > -------------------------------------------------------------------------------------------- > Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET > > Chair of Spoken Language Processing > Vocal Interactivity Lab (VILab), Sheffield Robotics > Speech & Hearing Research Group (SPandH) > Department of Computer Science, UNIVERSITY OF SHEFFIELD > Regent Court, 211 Portobello, Sheffield, S1 4DP, UK > > * Winner of the 2016 Antonio Zampolli Prize for "*Outstanding > Contributions * > *to the Advancement of Language Resources & Language Technology * > *Evaluation within Human Language Technologies*" > > e-mail: r.k.moore@xxxxxxxx > web: http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/ > twitter: @xxxxxxxx > Tel: +44 (0) 11422 21807 > Fax: +44 (0) 11422 21810 > Mob: +44 (0) 7910 073631 > > Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE > (http://www.journals.elsevier.com/computer-speech-and-language/) > > -------------------------------------------------------------------------------------------- > > > On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney <malcolm@xxxxxxxx> wrote: > > Has there been anything formal published on this effect? > > https://www.iflscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illusion > > It sounds to me like a semantic version of the McGurk effect. > > Nice demo. > > - Malcolm > > > > -- > Julia Strand, PhD > Assistant Professor of Psychology > Carleton College > One North College Street > Northfield, MN 55057 > 507-222-5637 > Website <https://apps.carleton.edu/curricular/psyc/jstrand/> > Make an appointment <http://juliastrand.youcanbook.me> > --00000000000004160c05ac8b39cc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div>Could you provide a reference for th= is?<br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D= "gmail_attr">On Sun, 9 Aug 2020 at 00:22, Sommers, Mitchell <<a href=3D"= mailto:msommers@xxxxxxxx">msommers@xxxxxxxx</a>> wrote:<br></div><bloc= kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:= 1px solid rgb(204,204,204);padding-left:1ex"> <div dir=3D"ltr"> <div style=3D"font-family:"Times New Roman",Times,serif;font-size= :16pt;color:rgb(0,0,0)"> We have some really powerful demonstrations of context effects in hearing. = If you ask people to identify the last word in a sentence such as "the= plumber fixed a drink", with the last word in noise, about 40% of you= ng adults and 80% of older adults will report hearing "sink". If you then ask to rate "how sure you are t= hat you heard the word you responded with", older adults will give 100= % confidence rating about half the time.=C2=A0</div> <div> <div style=3D"font-family:"Times New Roman",Times,serif;font-size= :16pt;color:rgb(0,0,0)"> <br> </div> <div id=3D"gmail-m_-4536363119572634461Signature"> <div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt"></span><br> </div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt">Mitchell S. Sommers</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt">Professor of Psychological and Brain Sciences</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt">Washington University in St. Louis</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt"><br> </span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:"Times New Roman",Times,serif;font-siz= e:14pt">Email: <a href=3D"mailto:Msommers@xxxxxxxx" target=3D"_blank">Msom= mers@xxxxxxxx</a></span></div> </div> </div> </div> <div id=3D"gmail-m_-4536363119572634461appendonsend"></div> <hr style=3D"display:inline-block;width:98%"> <div id=3D"gmail-m_-4536363119572634461divRplyFwdMsg" dir=3D"ltr"><font fac= e=3D"Calibri, sans-serif" style=3D"font-size:11pt" color=3D"#000000"><b>Fro= m:</b> AUDITORY - Research in Auditory Perception <<a href=3D"mailto:AUD= ITORY@xxxxxxxx" target=3D"_blank">AUDITORY@xxxxxxxx</a>> o= n behalf of Julia Strand <<a href=3D"mailto:00000071c2dbe20f-dmarc-reque= st@xxxxxxxx" target=3D"_blank">00000071c2dbe20f-dmarc-request@xxxxxxxx= MCGILL.CA</a>><br> <b>Sent:</b> Friday, August 7, 2020 8:42 AM<br> <b>To:</b> <a href=3D"mailto:AUDITORY@xxxxxxxx" target=3D"_blank">AU= DITORY@xxxxxxxx</a> <<a href=3D"mailto:AUDITORY@xxxxxxxx" = target=3D"_blank">AUDITORY@xxxxxxxx</a>><br> <b>Subject:</b> Re: Semantic McGurk Effect</font> <div>=C2=A0</div> </div> <div> <table border=3D"1" cellspacing=3D"0" cellpadding=3D"0" style=3D"border:1pt= outset rgb(255,239,239)"> <tbody> <tr> <td style=3D"border:1pt inset rgb(255,239,239);background:rgb(255,45,0);pad= ding:1.5pt"> <p><strong><span style=3D"font-size:10pt;font-family:"Helvetica Neue&q= uot;;color:white">* External Email - Caution *</span></strong></p> </td> </tr> </tbody> </table> <div> <div dir=3D"ltr">I'm always delighted when auditory phenomena=C2=A0spar= k the public's interest!=C2=A0 <div><br> </div> <div>I wouldn't call this a semantic McGurk, given that it doesn't = have to be driven by simultaneous bottom-up input from two modalities. That= is, even if nothing is written on the screen but you're just thinking = "green needle" to yourself, that's what you're likely to hear (whereas thinking "ga" while hearing "ba&quo= t; won't get you to "da" - you need the simultaneous input fr= om face and voice). So I'd agree with Roger that it's more akin to = the phoneme restoration effect or work like Cynthia Connine's "she= ran hot water for the p/bath," showing how expectations influence interpretat= ion of bottom-up input. <div><br> </div> <div>I think most of US wouldn't be surprised that the same stimulus ca= n be perceived in different ways, but my impression is that the general pub= lic tends to believe "what you see is what you get" and underesti= mates the power of top-down influences. Same reason #TheDress was such a hit.=C2=A0</div> <div><br> </div> <div>When I include this in my class on speech perception, I also include t= his <a href=3D"https://languagelog.ldc.upenn.edu/nll/?p=3D41249" target=3D"= _blank"> video which shows Grover from Sesame street</a> saying EITHER "Yes, ye= s, that sounds like an excellent idea!" OR "Yes, yes, that's = a f*%#g excellent idea!"</div> <div><br> </div> <div>Like I'm always telling my students - Speech is hard! Context help= s!</div> <div><br> </div> <div>Best,</div> <div>Julia</div> </div> </div> <br> <div> <div dir=3D"ltr">On Fri, Aug 7, 2020 at 4:28 AM Prof. Roger K. Moore <<a= href=3D"mailto:0000011559506d60-dmarc-request@xxxxxxxx" target=3D"_= blank">0000011559506d60-dmarc-request@xxxxxxxx</a>> wrote:<br> </div> <blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204= ,204,204);padding-left:1ex"> <div dir=3D"ltr">I must admit to being surprised by the surprise engendered= by this video.=C2=A0 Anyone who was around during the early days of text-t= o-speech synthesis=C2=A0is very aware of the danger of presenting the text = in advance of or simultaneous=C2=A0with the generated speech.=C2=A0 The intelligibility of the resulting synthesis=C2=A0could be= zero without the 'prior' and 100% with the visual cue. <div><br> </div> <div>So, given that we know that perception involves the integration of top= -down expectations with bottom-up evidence (going right back to Richard War= ren's work on the 'phoneme restoration effect'), why is this Ti= kTok demo surprising?=C2=A0 Or maybe I'm missing something? <div><br> </div> <div>Best wishes</div> <div>Roger</div> <div><br clear=3D"all"> <div> <div dir=3D"ltr"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"><font size=3D"1">-----------------------------------------= ---------------------------------------------------<br> Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET<br> <br> Chair of Spoken Language Processing<br> Vocal Interactivity Lab (VILab), Sheffield Robotics<br> Speech & Hearing Research Group (SPandH)<br> Department of Computer Science, UNIVERSITY OF SHEFFIELD<br> Regent Court, 211 Portobello, Sheffield, S1 4DP, UK</font> <div><font size=3D"1" face=3D"arial, helvetica, sans-serif"><br> </font></div> <div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1">* Winner of=C2= =A0the 2016 Antonio Zampolli Prize for "<i>Outstanding Contributions= =C2=A0</i></font></div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>to the Advan= cement of Language Resources & Language Technology=C2=A0</i></font></di= v> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>Evaluation w= ithin Human Language Technologies</i>"</font></div> <font size=3D"1"><br> e-mail:=C2=A0 <a href=3D"mailto:r.k.moore@xxxxxxxx" target=3D"_blank= ">r.k.moore@xxxxxxxx</a><br> web:=C2=A0<a href=3D"http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/" targ= et=3D"_blank">http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/</a></font></= div> <div><font size=3D"1">twitter: @xxxxxxxx<br> Tel: +44 (0) 11422 21807<br> Fax: +44 (0) 11422 21810<br> Mob: +44 (0) 7910 073631<br> <br> Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE<br> (<a href=3D"http://www.journals.elsevier.com/computer-speech-and-language/"= target=3D"_blank">http://www.journals.elsevier.com/computer-speech-and-lan= guage/</a>)</font></div> <div><span style=3D"font-size:x-small">------------------------------------= --------------------------------------------------------</span><br> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <a href=3D"http:///" target=3D"_blank"></a></div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <br> </div> </div> </div> <br> <div> <div dir=3D"ltr">On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney <<a href=3D= "mailto:malcolm@xxxxxxxx" target=3D"_blank">malcolm@xxxxxxxx</a>> wrote:= <br> </div> <blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204= ,204,204);padding-left:1ex"> <div>Has there been anything formal published on this effect? <div>=C2=A0 =C2=A0<a href=3D"https://www.iflscience.com/brain/what-the-hell= -is-going-on-in-this-tiktok-audio-illusion" style=3D"color:rgb(17,85,204);f= ont-family:Arial,Helvetica,sans-serif;font-size:small;font-variant-ligature= s:normal;background-color:rgb(255,255,255)" target=3D"_blank">https://www.i= flscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illusion= </a></div> <div><br> </div> <div>It sounds to me like a semantic version of the McGurk effect.</div> <div><br> </div> <div>Nice demo.</div> <div><br> </div> <div>- Malcolm</div> <div><br> </div> </div> </blockquote> </div> </blockquote> </div> <br clear=3D"all"> <div><br> </div> -- <br> <div dir=3D"ltr"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div dir=3D"ltr">Julia Strand, PhD <div>Assistant Professor of Psychology</div> <div>Carleton College</div> <div>One North College Street</div> <div>Northfield, MN 55057</div> <div>507-222-5637</div> <div><a href=3D"https://apps.carleton.edu/curricular/psyc/jstrand/" target= =3D"_blank">Website</a></div> <div><a href=3D"http://juliastrand.youcanbook.me" target=3D"_blank">Make an= appointment</a></div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </blockquote></div></div> --00000000000004160c05ac8b39cc--