Re: [AUDITORY] Semantic McGurk Effect (Sam Mathias )


Subject: Re: [AUDITORY] Semantic McGurk Effect
From:    Sam Mathias  <samuel.mathias@xxxxxxxx>
Date:    Mon, 10 Aug 2020 15:47:07 -0400

--00000000000004160c05ac8b39cc Content-Type: text/plain; charset="UTF-8" Could you provide a reference for this? On Sun, 9 Aug 2020 at 00:22, Sommers, Mitchell <msommers@xxxxxxxx> wrote: > We have some really powerful demonstrations of context effects in hearing. > If you ask people to identify the last word in a sentence such as "the > plumber fixed a drink", with the last word in noise, about 40% of young > adults and 80% of older adults will report hearing "sink". If you then ask > to rate "how sure you are that you heard the word you responded with", > older adults will give 100% confidence rating about half the time. > > > Mitchell S. Sommers > Professor of Psychological and Brain Sciences > Washington University in St. Louis > > Email: Msommers@xxxxxxxx > ------------------------------ > *From:* AUDITORY - Research in Auditory Perception < > AUDITORY@xxxxxxxx> on behalf of Julia Strand < > 00000071c2dbe20f-dmarc-request@xxxxxxxx> > *Sent:* Friday, August 7, 2020 8:42 AM > *To:* AUDITORY@xxxxxxxx <AUDITORY@xxxxxxxx> > *Subject:* Re: Semantic McGurk Effect > > > ** External Email - Caution ** > I'm always delighted when auditory phenomena spark the public's interest! > > I wouldn't call this a semantic McGurk, given that it doesn't have to be > driven by simultaneous bottom-up input from two modalities. That is, even > if nothing is written on the screen but you're just thinking "green needle" > to yourself, that's what you're likely to hear (whereas thinking "ga" while > hearing "ba" won't get you to "da" - you need the simultaneous input from > face and voice). So I'd agree with Roger that it's more akin to the phoneme > restoration effect or work like Cynthia Connine's "she ran hot water for > the p/bath," showing how expectations influence interpretation of bottom-up > input. > > I think most of US wouldn't be surprised that the same stimulus can be > perceived in different ways, but my impression is that the general public > tends to believe "what you see is what you get" and underestimates the > power of top-down influences. Same reason #TheDress was such a hit. > > When I include this in my class on speech perception, I also include this video > which shows Grover from Sesame street > <https://languagelog.ldc.upenn.edu/nll/?p=41249> saying EITHER "Yes, yes, > that sounds like an excellent idea!" OR "Yes, yes, that's a f*%#g excellent > idea!" > > Like I'm always telling my students - Speech is hard! Context helps! > > Best, > Julia > > On Fri, Aug 7, 2020 at 4:28 AM Prof. Roger K. Moore < > 0000011559506d60-dmarc-request@xxxxxxxx> wrote: > > I must admit to being surprised by the surprise engendered by this video. > Anyone who was around during the early days of text-to-speech synthesis is > very aware of the danger of presenting the text in advance of or > simultaneous with the generated speech. The intelligibility of the > resulting synthesis could be zero without the 'prior' and 100% with the > visual cue. > > So, given that we know that perception involves the integration of > top-down expectations with bottom-up evidence (going right back to Richard > Warren's work on the 'phoneme restoration effect'), why is this TikTok demo > surprising? Or maybe I'm missing something? > > Best wishes > Roger > > > -------------------------------------------------------------------------------------------- > Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET > > Chair of Spoken Language Processing > Vocal Interactivity Lab (VILab), Sheffield Robotics > Speech & Hearing Research Group (SPandH) > Department of Computer Science, UNIVERSITY OF SHEFFIELD > Regent Court, 211 Portobello, Sheffield, S1 4DP, UK > > * Winner of the 2016 Antonio Zampolli Prize for "*Outstanding > Contributions * > *to the Advancement of Language Resources & Language Technology * > *Evaluation within Human Language Technologies*" > > e-mail: r.k.moore@xxxxxxxx > web: http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/ > twitter: @xxxxxxxx > Tel: +44 (0) 11422 21807 > Fax: +44 (0) 11422 21810 > Mob: +44 (0) 7910 073631 > > Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE > (http://www.journals.elsevier.com/computer-speech-and-language/) > > -------------------------------------------------------------------------------------------- > > > On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney <malcolm@xxxxxxxx> wrote: > > Has there been anything formal published on this effect? > > https://www.iflscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illusion > > It sounds to me like a semantic version of the McGurk effect. > > Nice demo. > > - Malcolm > > > > -- > Julia Strand, PhD > Assistant Professor of Psychology > Carleton College > One North College Street > Northfield, MN 55057 > 507-222-5637 > Website <https://apps.carleton.edu/curricular/psyc/jstrand/> > Make an appointment <http://juliastrand.youcanbook.me> > --00000000000004160c05ac8b39cc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div>Could you provide a reference for th= is?<br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D= "gmail_attr">On Sun, 9 Aug 2020 at 00:22, Sommers, Mitchell &lt;<a href=3D"= mailto:msommers@xxxxxxxx">msommers@xxxxxxxx</a>&gt; wrote:<br></div><bloc= kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:= 1px solid rgb(204,204,204);padding-left:1ex"> <div dir=3D"ltr"> <div style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-size= :16pt;color:rgb(0,0,0)"> We have some really powerful demonstrations of context effects in hearing. = If you ask people to identify the last word in a sentence such as &quot;the= plumber fixed a drink&quot;, with the last word in noise, about 40% of you= ng adults and 80% of older adults will report hearing &quot;sink&quot;. If you then ask to rate &quot;how sure you are t= hat you heard the word you responded with&quot;, older adults will give 100= % confidence rating about half the time.=C2=A0</div> <div> <div style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-size= :16pt;color:rgb(0,0,0)"> <br> </div> <div id=3D"gmail-m_-4536363119572634461Signature"> <div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt"></span><br> </div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt">Mitchell S. Sommers</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt">Professor of Psychological and Brain Sciences</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt">Washington University in St. Louis</span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt"><br> </span></div> <div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt= ;color:rgb(0,0,0)"> <span style=3D"font-family:&quot;Times New Roman&quot;,Times,serif;font-siz= e:14pt">Email: <a href=3D"mailto:Msommers@xxxxxxxx" target=3D"_blank">Msom= mers@xxxxxxxx</a></span></div> </div> </div> </div> <div id=3D"gmail-m_-4536363119572634461appendonsend"></div> <hr style=3D"display:inline-block;width:98%"> <div id=3D"gmail-m_-4536363119572634461divRplyFwdMsg" dir=3D"ltr"><font fac= e=3D"Calibri, sans-serif" style=3D"font-size:11pt" color=3D"#000000"><b>Fro= m:</b> AUDITORY - Research in Auditory Perception &lt;<a href=3D"mailto:AUD= ITORY@xxxxxxxx" target=3D"_blank">AUDITORY@xxxxxxxx</a>&gt; o= n behalf of Julia Strand &lt;<a href=3D"mailto:00000071c2dbe20f-dmarc-reque= st@xxxxxxxx" target=3D"_blank">00000071c2dbe20f-dmarc-request@xxxxxxxx= MCGILL.CA</a>&gt;<br> <b>Sent:</b> Friday, August 7, 2020 8:42 AM<br> <b>To:</b> <a href=3D"mailto:AUDITORY@xxxxxxxx" target=3D"_blank">AU= DITORY@xxxxxxxx</a> &lt;<a href=3D"mailto:AUDITORY@xxxxxxxx" = target=3D"_blank">AUDITORY@xxxxxxxx</a>&gt;<br> <b>Subject:</b> Re: Semantic McGurk Effect</font> <div>=C2=A0</div> </div> <div> <table border=3D"1" cellspacing=3D"0" cellpadding=3D"0" style=3D"border:1pt= outset rgb(255,239,239)"> <tbody> <tr> <td style=3D"border:1pt inset rgb(255,239,239);background:rgb(255,45,0);pad= ding:1.5pt"> <p><strong><span style=3D"font-size:10pt;font-family:&quot;Helvetica Neue&q= uot;;color:white">* External Email - Caution *</span></strong></p> </td> </tr> </tbody> </table> <div> <div dir=3D"ltr">I&#39;m always delighted when auditory phenomena=C2=A0spar= k the public&#39;s interest!=C2=A0 <div><br> </div> <div>I wouldn&#39;t call this a semantic McGurk, given that it doesn&#39;t = have to be driven by simultaneous bottom-up input from two modalities. That= is, even if nothing is written on the screen but you&#39;re just thinking = &quot;green needle&quot; to yourself, that&#39;s what you&#39;re likely to hear (whereas thinking &quot;ga&quot; while hearing &quot;ba&quo= t; won&#39;t get you to &quot;da&quot; - you need the simultaneous input fr= om face and voice). So I&#39;d agree with Roger that it&#39;s more akin to = the phoneme restoration effect or work like Cynthia Connine&#39;s &quot;she= ran hot water for the p/bath,&quot; showing how expectations influence interpretat= ion of bottom-up input. <div><br> </div> <div>I think most of US wouldn&#39;t be surprised that the same stimulus ca= n be perceived in different ways, but my impression is that the general pub= lic tends to believe &quot;what you see is what you get&quot; and underesti= mates the power of top-down influences. Same reason #TheDress was such a hit.=C2=A0</div> <div><br> </div> <div>When I include this in my class on speech perception, I also include t= his <a href=3D"https://languagelog.ldc.upenn.edu/nll/?p=3D41249" target=3D"= _blank"> video which shows Grover from Sesame street</a> saying EITHER &quot;Yes, ye= s, that sounds like an excellent idea!&quot; OR &quot;Yes, yes, that&#39;s = a f*%#g excellent idea!&quot;</div> <div><br> </div> <div>Like I&#39;m always telling my students - Speech is hard! Context help= s!</div> <div><br> </div> <div>Best,</div> <div>Julia</div> </div> </div> <br> <div> <div dir=3D"ltr">On Fri, Aug 7, 2020 at 4:28 AM Prof. Roger K. Moore &lt;<a= href=3D"mailto:0000011559506d60-dmarc-request@xxxxxxxx" target=3D"_= blank">0000011559506d60-dmarc-request@xxxxxxxx</a>&gt; wrote:<br> </div> <blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204= ,204,204);padding-left:1ex"> <div dir=3D"ltr">I must admit to being surprised by the surprise engendered= by this video.=C2=A0 Anyone who was around during the early days of text-t= o-speech synthesis=C2=A0is very aware of the danger of presenting the text = in advance of or simultaneous=C2=A0with the generated speech.=C2=A0 The intelligibility of the resulting synthesis=C2=A0could be= zero without the &#39;prior&#39; and 100% with the visual cue. <div><br> </div> <div>So, given that we know that perception involves the integration of top= -down expectations with bottom-up evidence (going right back to Richard War= ren&#39;s work on the &#39;phoneme restoration effect&#39;), why is this Ti= kTok demo surprising?=C2=A0 Or maybe I&#39;m missing something? <div><br> </div> <div>Best wishes</div> <div>Roger</div> <div><br clear=3D"all"> <div> <div dir=3D"ltr"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"><font size=3D"1">-----------------------------------------= ---------------------------------------------------<br> Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET<br> <br> Chair of Spoken Language Processing<br> Vocal Interactivity Lab (VILab), Sheffield Robotics<br> Speech &amp; Hearing Research Group (SPandH)<br> Department of Computer Science, UNIVERSITY OF SHEFFIELD<br> Regent Court, 211 Portobello, Sheffield, S1 4DP, UK</font> <div><font size=3D"1" face=3D"arial, helvetica, sans-serif"><br> </font></div> <div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1">* Winner of=C2= =A0the 2016 Antonio Zampolli Prize for &quot;<i>Outstanding Contributions= =C2=A0</i></font></div> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>to the Advan= cement of Language Resources &amp; Language Technology=C2=A0</i></font></di= v> <div><font face=3D"arial, helvetica, sans-serif" size=3D"1"><i>Evaluation w= ithin Human Language Technologies</i>&quot;</font></div> <font size=3D"1"><br> e-mail:=C2=A0 <a href=3D"mailto:r.k.moore@xxxxxxxx" target=3D"_blank= ">r.k.moore@xxxxxxxx</a><br> web:=C2=A0<a href=3D"http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/" targ= et=3D"_blank">http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/</a></font></= div> <div><font size=3D"1">twitter: @xxxxxxxx<br> Tel: +44 (0) 11422 21807<br> Fax: +44 (0) 11422 21810<br> Mob: +44 (0) 7910 073631<br> <br> Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE<br> (<a href=3D"http://www.journals.elsevier.com/computer-speech-and-language/"= target=3D"_blank">http://www.journals.elsevier.com/computer-speech-and-lan= guage/</a>)</font></div> <div><span style=3D"font-size:x-small">------------------------------------= --------------------------------------------------------</span><br> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <a href=3D"http:///" target=3D"_blank"></a></div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> <br> </div> </div> </div> <br> <div> <div dir=3D"ltr">On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney &lt;<a href=3D= "mailto:malcolm@xxxxxxxx" target=3D"_blank">malcolm@xxxxxxxx</a>&gt; wrote:= <br> </div> <blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204= ,204,204);padding-left:1ex"> <div>Has there been anything formal published on this effect? <div>=C2=A0 =C2=A0<a href=3D"https://www.iflscience.com/brain/what-the-hell= -is-going-on-in-this-tiktok-audio-illusion" style=3D"color:rgb(17,85,204);f= ont-family:Arial,Helvetica,sans-serif;font-size:small;font-variant-ligature= s:normal;background-color:rgb(255,255,255)" target=3D"_blank">https://www.i= flscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illusion= </a></div> <div><br> </div> <div>It sounds to me like a semantic version of the McGurk effect.</div> <div><br> </div> <div>Nice demo.</div> <div><br> </div> <div>- Malcolm</div> <div><br> </div> </div> </blockquote> </div> </blockquote> </div> <br clear=3D"all"> <div><br> </div> -- <br> <div dir=3D"ltr"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div> <div dir=3D"ltr"> <div dir=3D"ltr">Julia Strand, PhD <div>Assistant Professor of Psychology</div> <div>Carleton College</div> <div>One North College Street</div> <div>Northfield, MN 55057</div> <div>507-222-5637</div> <div><a href=3D"https://apps.carleton.edu/curricular/psyc/jstrand/" target= =3D"_blank">Website</a></div> <div><a href=3D"http://juliastrand.youcanbook.me" target=3D"_blank">Make an= appointment</a></div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </blockquote></div></div> --00000000000004160c05ac8b39cc--


This message came from the mail archive
src/postings/2020/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University