Re: [AUDITORY] Semantic McGurk Effect (Julia Strand )


Subject: Re: [AUDITORY] Semantic McGurk Effect
From:    Julia Strand  <00000071c2dbe20f-dmarc-request@xxxxxxxx>
Date:    Fri, 7 Aug 2020 08:42:02 -0500

--00000000000094303805ac49c603 Content-Type: text/plain; charset="UTF-8" I'm always delighted when auditory phenomena spark the public's interest! I wouldn't call this a semantic McGurk, given that it doesn't have to be driven by simultaneous bottom-up input from two modalities. That is, even if nothing is written on the screen but you're just thinking "green needle" to yourself, that's what you're likely to hear (whereas thinking "ga" while hearing "ba" won't get you to "da" - you need the simultaneous input from face and voice). So I'd agree with Roger that it's more akin to the phoneme restoration effect or work like Cynthia Connine's "she ran hot water for the p/bath," showing how expectations influence interpretation of bottom-up input. I think most of US wouldn't be surprised that the same stimulus can be perceived in different ways, but my impression is that the general public tends to believe "what you see is what you get" and underestimates the power of top-down influences. Same reason #TheDress was such a hit. When I include this in my class on speech perception, I also include this video which shows Grover from Sesame street <https://languagelog.ldc.upenn.edu/nll/?p=41249> saying EITHER "Yes, yes, that sounds like an excellent idea!" OR "Yes, yes, that's a f*%#g excellent idea!" Like I'm always telling my students - Speech is hard! Context helps! Best, Julia On Fri, Aug 7, 2020 at 4:28 AM Prof. Roger K. Moore < 0000011559506d60-dmarc-request@xxxxxxxx> wrote: > I must admit to being surprised by the surprise engendered by this video. > Anyone who was around during the early days of text-to-speech synthesis is > very aware of the danger of presenting the text in advance of or > simultaneous with the generated speech. The intelligibility of the > resulting synthesis could be zero without the 'prior' and 100% with the > visual cue. > > So, given that we know that perception involves the integration of > top-down expectations with bottom-up evidence (going right back to Richard > Warren's work on the 'phoneme restoration effect'), why is this TikTok demo > surprising? Or maybe I'm missing something? > > Best wishes > Roger > > > -------------------------------------------------------------------------------------------- > Prof ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET > > Chair of Spoken Language Processing > Vocal Interactivity Lab (VILab), Sheffield Robotics > Speech & Hearing Research Group (SPandH) > Department of Computer Science, UNIVERSITY OF SHEFFIELD > Regent Court, 211 Portobello, Sheffield, S1 4DP, UK > > * Winner of the 2016 Antonio Zampolli Prize for "*Outstanding > Contributions * > *to the Advancement of Language Resources & Language Technology * > *Evaluation within Human Language Technologies*" > > e-mail: r.k.moore@xxxxxxxx > web: http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/ > twitter: @xxxxxxxx > Tel: +44 (0) 11422 21807 > Fax: +44 (0) 11422 21810 > Mob: +44 (0) 7910 073631 > > Editor-in-Chief: COMPUTER SPEECH AND LANGUAGE > (http://www.journals.elsevier.com/computer-speech-and-language/) > > -------------------------------------------------------------------------------------------- > > > On Fri, 7 Aug 2020 at 05:12, Malcolm Slaney <malcolm@xxxxxxxx> wrote: > >> Has there been anything formal published on this effect? >> >> https://www.iflscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audio-illusion >> >> It sounds to me like a semantic version of the McGurk effect. >> >> Nice demo. >> >> - Malcolm >> >> -- Julia Strand, PhD Assistant Professor of Psychology Carleton College One North College Street Northfield, MN 55057 507-222-5637 Website <https://apps.carleton.edu/curricular/psyc/jstrand/> Make an appointment <http://juliastrand.youcanbook.me> --00000000000094303805ac49c603 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">I&#39;m always delighted when auditory phenomena=C2=A0spar= k the public&#39;s interest!=C2=A0<div><br></div><div>I wouldn&#39;t call t= his a semantic McGurk, given that it doesn&#39;t have to be driven by simul= taneous bottom-up input from two modalities. That is, even if nothing is wr= itten on the screen but you&#39;re just thinking &quot;green needle&quot; t= o yourself, that&#39;s what you&#39;re likely to hear (whereas thinking &qu= ot;ga&quot; while hearing &quot;ba&quot; won&#39;t get you to &quot;da&quot= ; - you need the simultaneous input from face and voice). So I&#39;d agree = with Roger that it&#39;s more akin to the phoneme restoration effect or wor= k like Cynthia Connine&#39;s &quot;she ran hot water for the p/bath,&quot; = showing how expectations influence interpretation of bottom-up input.<div><= br></div><div>I think most of US wouldn&#39;t be surprised that the same st= imulus can be perceived in different ways, but my impression is that the ge= neral public tends to believe &quot;what you see is what you get&quot; and = underestimates the power of top-down influences. Same reason #TheDress was = such a hit.=C2=A0</div><div><br></div><div>When I include this in my class = on speech perception, I also include this <a href=3D"https://languagelog.ld= c.upenn.edu/nll/?p=3D41249">video which shows Grover from Sesame street</a>= saying EITHER &quot;Yes, yes, that sounds like an excellent idea!&quot; OR= &quot;Yes, yes, that&#39;s a f*%#g excellent idea!&quot;</div><div><br></d= iv><div>Like I&#39;m always telling my students - Speech is hard! Context h= elps!</div><div><br></div><div>Best,</div><div>Julia</div></div></div><br><= div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Aug= 7, 2020 at 4:28 AM Prof. Roger K. Moore &lt;<a href=3D"mailto:000001155950= 6d60-dmarc-request@xxxxxxxx">0000011559506d60-dmarc-request@xxxxxxxx= gill.ca</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"= margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef= t:1ex"><div dir=3D"ltr">I must admit to being surprised by the surprise eng= endered by this video.=C2=A0 Anyone who was around during the early days of= text-to-speech synthesis=C2=A0is very aware of the danger of presenting th= e text in advance of or simultaneous=C2=A0with the generated speech.=C2=A0 = The intelligibility of the resulting synthesis=C2=A0could be zero without t= he &#39;prior&#39; and 100% with the visual cue.<div><br></div><div>So, giv= en that we know that perception involves the integration of top-down expect= ations with bottom-up evidence (going right back to Richard Warren&#39;s wo= rk on the &#39;phoneme restoration effect&#39;), why is this TikTok demo su= rprising?=C2=A0 Or maybe I&#39;m missing something?<div><br></div><div>Best= wishes</div><div>Roger</div><div><br clear=3D"all"><div><div dir=3D"ltr"><= div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir= =3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">= <div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div= dir=3D"ltr"><div><div dir=3D"ltr"><font size=3D"1">-----------------------= ---------------------------------------------------------------------<br>Pr= of ROGER K MOORE* BA(Hons) MSc PhD FIOA FISCA MIET<br><br>Chair of Spoken L= anguage Processing<br>Vocal Interactivity Lab (VILab), Sheffield Robotics<b= r>Speech &amp; Hearing Research Group (SPandH)<br>Department of Computer Sc= ience, UNIVERSITY OF SHEFFIELD<br>Regent Court, 211 Portobello, Sheffield, = S1 4DP, UK</font><div><font size=3D"1" face=3D"arial, helvetica, sans-serif= "><br></font></div><div><div><font face=3D"arial, helvetica, sans-serif" si= ze=3D"1">* Winner of=C2=A0the 2016 Antonio Zampolli Prize for &quot;<i>Outs= tanding Contributions=C2=A0</i></font></div><div><font face=3D"arial, helve= tica, sans-serif" size=3D"1"><i>to the Advancement of Language Resources &a= mp; Language Technology=C2=A0</i></font></div><div><font face=3D"arial, hel= vetica, sans-serif" size=3D"1"><i>Evaluation within Human Language Technolo= gies</i>&quot;</font></div><font size=3D"1"><br>e-mail:=C2=A0 <a href=3D"ma= ilto:r.k.moore@xxxxxxxx" target=3D"_blank">r.k.moore@xxxxxxxx= </a><br>web:=C2=A0<a href=3D"http://staffwww.dcs.shef.ac.uk/people/R.K.Moor= e/" target=3D"_blank">http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/</a><= /font></div><div><font size=3D"1">twitter: @xxxxxxxx<br>Tel: +44 (0) 114= 22 21807<br>Fax: +44 (0) 11422 21810<br>Mob: +44 (0) 7910 073631<br><br>Edi= tor-in-Chief: COMPUTER SPEECH AND LANGUAGE<br>(<a href=3D"http://www.journa= ls.elsevier.com/computer-speech-and-language/" target=3D"_blank">http://www= .journals.elsevier.com/computer-speech-and-language/</a>)</font></div><div>= <span style=3D"font-size:x-small">-----------------------------------------= ---------------------------------------------------</span><br></div></div><= /div></div></div></div></div></div></div></div></div><a href=3D"http:///" t= arget=3D"_blank"></a></div></div></div></div></div></div></div></div></div>= </div></div></div></div></div></div><br></div></div></div><br><div class=3D= "gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, 7 Aug 2020 at 0= 5:12, Malcolm Slaney &lt;<a href=3D"mailto:malcolm@xxxxxxxx" target=3D"_bla= nk">malcolm@xxxxxxxx</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo= te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204= );padding-left:1ex"><div style=3D"overflow-wrap: break-word;">Has there bee= n anything formal published on this effect?<div>=C2=A0 =C2=A0<a href=3D"htt= ps://www.iflscience.com/brain/what-the-hell-is-going-on-in-this-tiktok-audi= o-illusion" style=3D"color:rgb(17,85,204);font-family:Arial,Helvetica,sans-= serif;font-size:small;font-variant-ligatures:normal;background-color:rgb(25= 5,255,255)" target=3D"_blank">https://www.iflscience.com/brain/what-the-hel= l-is-going-on-in-this-tiktok-audio-illusion</a></div><div><br></div><div>It= sounds to me like a semantic version of the McGurk effect.</div><div><br><= /div><div>Nice demo.</div><div><br></div><div>- Malcolm</div><div><br></div= ></div></blockquote></div> </blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"= class=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div= dir=3D"ltr"><div dir=3D"ltr">Julia Strand, PhD<div>Assistant Professor of = Psychology</div><div>Carleton College</div><div>One North College Street</d= iv><div>Northfield, MN 55057</div><div>507-222-5637</div><div><a href=3D"ht= tps://apps.carleton.edu/curricular/psyc/jstrand/" target=3D"_blank">Website= </a></div><div><a href=3D"http://juliastrand.youcanbook.me" target=3D"_blan= k">Make an appointment</a></div></div></div></div></div></div></div></div> --00000000000094303805ac49c603--


This message came from the mail archive
src/postings/2020/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University