Subject: Re: [AUDITORY] Visual references in sound localisation From: Norbert Kopco <kopco@xxxxxxxx> Date: Fri, 2 Mar 2018 09:37:24 -0500 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>This is a multi-part message in MIME format. --------------788EFF5162FD7E1352554ED9 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum2.it.mcgill.ca id w22EbQcV017130 Hello, You might find this paper relevant: *Kop=C4=8Do N, Best V, Carlile S *(2010). Speech localization in a=20 multitalker mixture=20 <http://pcl.ics.upjs.sk/wp-content/uploads/2014/10/Kopcoetal_JASA10.pdf>.= =20 Journal of the Acoustical Society of America , 127, 1450-1457 ( DOI:=20 10.1121/1.3290996). It describes an experiment in which we didn't use a visual reference.=20 Instead, we provided the subject with a priori information about where=20 the distractor speech sources are and they could use that information to=20 separate the target from the distractors. Assuming that the visual cues=20 would provide the same kind of information, the results might be similar. I hope this helps, Noro On 2/28/2018 10:23 AM, Engel Alonso-Martinez, Isaac wrote: > Dear all, > > Thank you all very much for your responses. > > It seems that there is plenty of literature on the effect of visual=20 > stimuli in auditory localisation. If anyone is interested, a summary=20 > of relevant keywords for this topic could be: 'visual capture',=20 > 'visual dominance', 'visual bias' and 'cross-modal bias'. Also, one=20 > may find relevant papers under: 'multimodal integration',=20 > 'multisensory integration' and 'cross-modal plasticity'. > > I have found that a common practice is to use only one visual cue and=20 > one auditory cue at the same time. If the two stimuli are close to be=20 > spatially congruent, the subject will probably bind the two of them=20 > together unconsciously, thus causing this 'visual capture' effect in=20 > which the visual stimulus dominates the auditory one. This may not=20 > happen if the two stimuli are not=C2=A0spatially congruent=C2=A0in a no= ticeable=20 > way [1, 2]. > > However, in the scenario that I proposed originally there are two=20 > auditory stimuli: one of them is explicitly associated to the visual=20 > cue and would act as an 'anchor', while the other one has to be=20 > located. Intuitively, one might think that if the two auditory cues=20 > are perceived as different sources, the risk of visual dominance=20 > should be small. > > As it has been pointed out, another part of the question is=C2=A0on=20 > 'relative localisation' and comparative judgements, particularly in=20 > multimodal scenarios.=C2=A0How good are we at estimating the location o= f=20 > two sound sources with respect to each=C2=A0other? And what happens=C2=A0= if we=20 > introduce visual cues? > > All suggestions=C2=A0are welcome! Thank you all again for your contribu= tions. > > Kind regards, > Isaac Engel > > References: > [1] Bosen, Adam K. et al. 2016. =E2=80=9CComparison of Congruence Judgm= ent and=20 > Auditory Localization Tasks for Assessing the Spatial Limits of Visual=20 > Capture.=E2=80=9D Biological Cybernetics 110(6): 455=E2=80=9371 > [2] Berger, Christopher C., et al. "Generic HRTFs may be good enough=20 > in Virtual Reality. Improving source localization through cross-modal=20 > plasticity." Frontiers in Neuroscience 12 (2018): 21. > > > --=20 > Isaac Engel > PhD student at=C2=A0Dyson School of Design Engineering > Imperial College London > 10 Princes Gardens > South Kensington, SW7 1NA, London > E-mail: isaac.engel@xxxxxxxx > > <http://www.imperial.ac.uk/design-engineering-school> > > www.imperial.ac.uk/design-engineering/research/human-performance-and-ex= perience/sound-and-audio-systems=20 > <http://www.imperial.ac.uk/design-engineering/research/human-performanc= e-and-experience/sound-and-audio-systems> > > > > -----------------------------------------------------------------------= - > *From:* Engel Alonso-Martinez, Isaac > *Sent:* 24 February 2018 19:08 > *To:* auditory@xxxxxxxx > *Subject:* Visual references in sound localisation > Dear all, > > I am interested in the impact of audible visual references in sound=20 > localisation tasks. > > For instance, let's say that you are presented two different=20 > continuous sounds (e.g., speech) coming from sources A and B, which=20 > are in different locations. While source A is clearly visible to you,=20 > B is invisible and you are asked to estimate its location. Will source=20 > A act as a spatial reference, helping you in doing a more accurate=20 > estimation, or will it be distracting and make the task more difficult? > > If anyone can point to some literature on this, it would be greatly=20 > appreciated. > > Kind regards, > Isaac Engel --------------788EFF5162FD7E1352554ED9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum2.it.mcgill.ca id w22EbQcV017130 <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF= -8"> </head> <body text=3D"#000000" bgcolor=3D"#FFFFFF"> <div class=3D"moz-cite-prefix">Hello,<br> <br> You might find this paper relevant:<br> <strong>Kop=C4=8Do N, Best V, Carlile S </strong>(2010). <a href=3D"http://pcl.ics.upjs.sk/wp-content/uploads/2014/10/Kopcoetal_JASA1= 0.pdf">Speech localization in a multitalker mixture</a>. Journal of the Acoustical Society of America , 127, 1450-1457 ( DOI: 10.1121/1.3290996).<br> <br> It describes an experiment in which we didn't use a visual reference. Instead, we provided the subject with a priori information about where the distractor speech sources are and they could use that information to separate the target from the distractors. Assuming that the visual cues would provide the same kind of information, the results might be similar.<br> <br> I hope this helps,<br> Noro<br> <br> On 2/28/2018 10:23 AM, Engel Alonso-Martinez, Isaac wrote:<br> </div> <blockquote type=3D"cite" cite=3D"mid:24070_1519880648_5A9789C7_24070_31_1_DB6PR0601MB21679593DED30= ADD6A2651ABD3C70@xxxxxxxx"> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DU= TF-8"> <style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top= :0;margin-bottom:0;} --></style> <div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-= serif;" dir=3D"ltr"> <div id=3D"divtagdefaultwrapper" style=3D"font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;" dir=3D"ltr"> <div>Dear all,</div> <div><br> </div> <div>Thank you all very much for your responses.=C2=A0</div> <div><br> </div> <div>It seems that there is plenty of literature on the effect of visual stimuli in auditory localisation. If anyone is interested, a summary of relevant keywords for this topic could be: 'visual capture', 'visual dominance', 'visual bias' and 'cross-modal bias'. Also, one may find relevant papers under: 'multimodal integration', 'multisensory integration' and 'cross-modal plasticity'.</div> <div><br> </div> <div>I have found that a common practice is to use only one visual cue and one auditory cue at the same time. If the two stimuli are close to be spatially congruent, the subject will probably bind the two of them together unconsciously, thus causing this 'visual capture' effect in which the visual stimulus dominates the auditory one. This may not happen if the two stimuli are not=C2=A0spatially congruent=C2= =A0in a noticeable way [1, 2].</div> <div><br> </div> <div>However, in the scenario that I proposed originally there are two auditory stimuli: one of them is explicitly associated to the visual cue and would act as an 'anchor', while the other one has to be located. Intuitively, one might think that if the two auditory cues are perceived as different sources, the risk of visual dominance should be small.</div> <div><br> </div> <div>As it has been pointed out, another part of the question is=C2=A0on 'relative localisation' and comparative judgements= , particularly in multimodal scenarios.=C2=A0How good are we at estimating the location of two sound sources with respect to each=C2=A0other? And what happens=C2=A0if we introduce visual= cues?</div> <div><br> </div> <div>All suggestions=C2=A0are welcome! Thank you all again for = your contributions.</div> <div><br> </div> <div>Kind regards,</div> <div>Isaac Engel</div> <div><br> </div> <div>References:</div> <div>[1] Bosen, Adam K. et al. 2016. =E2=80=9CComparison of Con= gruence Judgment and Auditory Localization Tasks for Assessing the Spatial Limits of Visual Capture.=E2=80=9D Biological Cyberne= tics 110(6): 455=E2=80=9371</div> <div>[2] Berger, Christopher C., et al. "Generic HRTFs may be good enough in Virtual Reality. Improving source localization through cross-modal plasticity." Frontiers in Neuroscience 12 (2018): 21.</div> <p style=3D"margin-top:0;margin-bottom:0"><br> </p> <div id=3D"Signature"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-si= ze: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;"> <p><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt">--=C2=A0</span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> Isaac Engel</span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> </span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt">PhD student at=C2=A0Dyson School of Design Engineering</span></fo= nt><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> Imperial College London</span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> 10 Princes Gardens</span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> South Kensington, SW7 1NA, London=C2=A0</span></font>= <font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> E-mail: <a class=3D"moz-txt-link-abbreviated" href=3D= "mailto:isaac.engel@xxxxxxxx">isaac.engel@xxxxxxxx</a></span>= </font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> </span></font><font style=3D"font-family:Tahoma" size=3D"2"><span style=3D"font-size:10pt"><br> </span></font><a href=3D"http://www.imperial.ac.uk/design-engineering-sc= hool" target=3D"_blank" id=3D"LPNoLP" style=3D"font-family:Ta= homa; font-size:16px" title=3D"http://www.imperial.ac.uk/design-engineering-s= chool Ctrl+Click or tap to follow the link" moz-do-not-send=3D"true"><font size=3D"2"><span id=3D"LPNoLP" style=3D"font-size:10pt"></span></fon= t></a></p> <p><a href=3D"http://www.imperial.ac.uk/design-engineering/research/human-perfo= rmance-and-experience/sound-and-audio-systems" class=3D"OWAAutoLink" id=3D"LPlnk65443" previewremoved=3D"true" moz-do-not-send=3D"true">www.im= perial.ac.uk/design-engineering/research/human-performance-and-experience= /sound-and-audio-systems</a><br> </p> </div> </div> <p style=3D"margin-top:0;margin-bottom:0"><br> </p> <div id=3D"Signature"> <div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:#000000; font-family:Calibri,Arial,Helvetica,sans-serif"> <p><br> </p> </div> </div> <div style=3D"color: rgb(0, 0, 0);"> <hr style=3D"display:inline-block;width:98%" tabindex=3D"-1"> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Engel Alonso-Martinez, Isaac<br> <b>Sent:</b> 24 February 2018 19:08<br> <b>To:</b> <a class=3D"moz-txt-link-abbreviated" href=3D"= mailto:auditory@xxxxxxxx">auditory@xxxxxxxx</a><br> <b>Subject:</b> Visual references in sound localisation</= font> <div>=C2=A0</div> </div> <div dir=3D"ltr"> <div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,EmojiFont,"= Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols"> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>Dear all,</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>I am interested in the impact of audible visual references in sound localisation tasks.</d= iv> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>For instance, let's say that you are presented two different continuous sounds (e.g., speech) coming from sources A and B, which are in different locations. While source A is clearly visible to you, B is invisible and you are asked to estimate its location. Will source A act as a spatial reference, helping you in doing a more accurate estimation, or will it be distracting and make the task more difficult?</di= v> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>If anyone can point to some literature on this, it would be greatly appreciated.</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>Kind regards,</div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,Emoji= Font,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"> <font size=3D"3" face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color=3D"black"><s= pan style=3D"font-size:12pt"> <div>Isaac Engel</div> </span></font></div> </div> </div> </div> </div> <style type=3D"text/css" style=3D"display:none"> <!-- p {margin-= top:0; margin-bottom:0} --> </style></div> </blockquote> <p><br> </p> </body> </html> --------------788EFF5162FD7E1352554ED9--