Subject: Re: [AUDITORY] Visual references in sound localisation From: T Qf <theoqf@xxxxxxxx> Date: Thu, 1 Mar 2018 12:43:57 +0100 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>--001a11441d1c0f797505665860b0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Issac, You might be interested in checking out this paper: https://www.ncbi.nlm.nih.gov/pubmed/28680049 Hopefully it's relevant to your work. Best, Theo --- Theofilos Petsas Music Information Retrieval Developer Native Instruments Schlesische Str. 29/30, 10997 Berlin On 28 February 2018 at 16:23, Engel Alonso-Martinez, Isaac < isaac.engel@xxxxxxxx> wrote: > Dear all, > > Thank you all very much for your responses. > > It seems that there is plenty of literature on the effect of visual > stimuli in auditory localisation. If anyone is interested, a summary of > relevant keywords for this topic could be: 'visual capture', 'visual > dominance', 'visual bias' and 'cross-modal bias'. Also, one may find > relevant papers under: 'multimodal integration', 'multisensory integratio= n' > and 'cross-modal plasticity'. > > I have found that a common practice is to use only one visual cue and one > auditory cue at the same time. If the two stimuli are close to be spatial= ly > congruent, the subject will probably bind the two of them together > unconsciously, thus causing this 'visual capture' effect in which the > visual stimulus dominates the auditory one. This may not happen if the tw= o > stimuli are not spatially congruent in a noticeable way [1, 2]. > > However, in the scenario that I proposed originally there are two auditor= y > stimuli: one of them is explicitly associated to the visual cue and would > act as an 'anchor', while the other one has to be located. Intuitively, o= ne > might think that if the two auditory cues are perceived as different > sources, the risk of visual dominance should be small. > > As it has been pointed out, another part of the question is on 'relative > localisation' and comparative judgements, particularly in multimodal > scenarios. How good are we at estimating the location of two sound source= s > with respect to each other? And what happens if we introduce visual cues? > > All suggestions are welcome! Thank you all again for your contributions. > > Kind regards, > Isaac Engel > > References: > [1] Bosen, Adam K. et al. 2016. =E2=80=9CComparison of Congruence Judgmen= t and > Auditory Localization Tasks for Assessing the Spatial Limits of Visual > Capture.=E2=80=9D Biological Cybernetics 110(6): 455=E2=80=9371 > [2] Berger, Christopher C., et al. "Generic HRTFs may be good enough in > Virtual Reality. Improving source localization through cross-modal > plasticity." Frontiers in Neuroscience 12 (2018): 21. > > > -- > Isaac Engel > PhD student at Dyson School of Design Engineering > Imperial College London > 10 Princes Gardens > <https://maps.google.com/?q=3D10+Princes+Gardens+%0D%0ASouth+Kensington,+= SW7&entry=3Dgmail&source=3Dg> > South Kensington, SW7 1NA, London > E-mail: isaac.engel@xxxxxxxx > > <http://www.imperial.ac.uk/design-engineering-school> > > www.imperial.ac.uk/design-engineering/research/human- > performance-and-experience/sound-and-audio-systems > > > > ------------------------------ > *From:* Engel Alonso-Martinez, Isaac > *Sent:* 24 February 2018 19:08 > *To:* auditory@xxxxxxxx > *Subject:* Visual references in sound localisation > > > Dear all, > > I am interested in the impact of audible visual references in sound > localisation tasks. > > For instance, let's say that you are presented two different continuous > sounds (e.g., speech) coming from sources A and B, which are in different > locations. While source A is clearly visible to you, B is invisible and y= ou > are asked to estimate its location. Will source A act as a spatial > reference, helping you in doing a more accurate estimation, or will it be > distracting and make the task more difficult? > > If anyone can point to some literature on this, it would be greatly > appreciated. > > Kind regards, > Isaac Engel > > --001a11441d1c0f797505665860b0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hi Issac,<div><br></div><div>You might be interested in ch= ecking out this paper:</div><div><br></div><div><a href=3D"https://www.ncbi= .nlm.nih.gov/pubmed/28680049">https://www.ncbi.nlm.nih.gov/pubmed/28680049<= /a><br></div><div><br></div><div>Hopefully it's relevant to your work.<= /div><div><br></div><div>Best,</div><div>Theo</div><div><br></div><div><spa= n style=3D"background-color:rgb(255,255,255)"><font color=3D"#666666">---</= font></span></div><div><span style=3D"background-color:rgb(255,255,255)"><f= ont color=3D"#666666">Theofilos Petsas</font></span></div><div><span style= =3D"background-color:rgb(255,255,255)"><font color=3D"#666666">Music Inform= ation Retrieval Developer=C2=A0</font></span></div><div><span style=3D"back= ground-color:rgb(255,255,255)"><font color=3D"#666666">Native Instruments</= font></span></div><div><span style=3D"background-color:rgb(255,255,255)"><f= ont color=3D"#666666">Schlesische Str. 29/30,=C2=A0</font></span></div><div= ><span style=3D"background-color:rgb(255,255,255)"><font color=3D"#666666">= 10997 Berlin</font></span></div><div><div class=3D"gmail-mod" style=3D"clea= r:none;padding-left:15px;padding-right:15px;line-height:1.24;font-family:ar= ial,sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:norm= al;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-alig= n:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:= 0px;text-decoration-style:initial;text-decoration-color:initial"><div class= =3D"gmail-_eFb" style=3D""><div style=3D""><div class=3D"gmail-kno-fb-ctx g= mail-_mr" style=3D"margin-top:7px"><div><br class=3D"gmail-Apple-interchang= e-newline"></div></div></div></div></div><br></div><div><br></div></div><di= v class=3D"gmail_extra"><br><div class=3D"gmail_quote">On 28 February 2018 = at 16:23, Engel Alonso-Martinez, Isaac <span dir=3D"ltr"><<a href=3D"mai= lto:isaac.engel@xxxxxxxx" target=3D"_blank">isaac.engel@xxxxxxxx= k</a>></span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir=3D"ltr"> <div id=3D"m_671709703960034571divtagdefaultwrapper" style=3D"font-size:12p= t;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir=3D"ltr"> <div id=3D"m_671709703960034571divtagdefaultwrapper" style=3D"font-size:12p= t;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif,EmojiFont,"= ;Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Se= goe UI Symbol","Android Emoji",EmojiSymbols" dir=3D"ltr"> <p style=3D"margin-top:0;margin-bottom:0"></p> <div>Dear all,</div> <div><br> </div> <div>Thank you all very much for your responses.=C2=A0</div> <div><br> </div> <div>It seems that there is plenty of literature on the effect of visual st= imuli in auditory localisation. If anyone is interested, a summary of relev= ant keywords for this topic could be: 'visual capture', 'visual= dominance', 'visual bias' and 'cross-modal bias'. Also, one may find relevant papers under: 'multimodal integ= ration', 'multisensory integration' and 'cross-modal plasti= city'.</div> <div><br> </div> <div>I have found that a common practice is to use only one visual cue and = one auditory cue at the same time. If the two stimuli are close to be spati= ally congruent, the subject will probably bind the two of them together unc= onsciously, thus causing this 'visual capture' effect in which the visual stimulus dominates the auditory on= e. This may not happen if the two stimuli are not=C2=A0spatially congruent= =C2=A0in a noticeable way [1, 2].</div> <div><br> </div> <div>However, in the scenario that I proposed originally there are two audi= tory stimuli: one of them is explicitly associated to the visual cue and wo= uld act as an 'anchor', while the other one has to be located. Intu= itively, one might think that if the two auditory cues are perceived as different sources, the risk of visual domin= ance should be small.</div> <div><br> </div> <div>As it has been pointed out, another part of the question is=C2=A0on &#= 39;relative localisation' and comparative judgements, particularly in m= ultimodal scenarios.=C2=A0How good are we at estimating the location of two= sound sources with respect to each=C2=A0other? And what happens=C2=A0if we introduce visual cues?</div> <div><br> </div> <div>All suggestions=C2=A0are welcome! Thank you all again for your contrib= utions.</div> <div><br> </div> <div>Kind regards,</div> <div>Isaac Engel</div> <div><br> </div> <div>References:</div> <div>[1] Bosen, Adam K. et al. 2016. =E2=80=9CComparison of Congruence Judg= ment and Auditory Localization Tasks for Assessing the Spatial Limits of Vi= sual Capture.=E2=80=9D Biological Cybernetics 110(6): 455=E2=80=9371</div> <div>[2] Berger, Christopher C., et al. "Generic HRTFs may be good eno= ugh in Virtual Reality. Improving source localization through cross-modal p= lasticity." Frontiers in Neuroscience 12 (2018): 21.</div> <p style=3D"margin-top:0;margin-bottom:0"><br> </p> <div id=3D"m_671709703960034571Signature"> <div id=3D"m_671709703960034571divtagdefaultwrapper" dir=3D"ltr" style=3D"f= ont-size:12pt;color:rgb(0,0,0);font-family:Calibri,Arial,Helvetica,sans-ser= if,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoC= olorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbol= s"> <p><font size=3D"2" style=3D"font-family:Tahoma"><span style=3D"font-size:1= 0pt">--=C2=A0</span></font><font size=3D"2" style=3D"font-family:Tahoma"><s= pan style=3D"font-size:10pt"><br> Isaac Engel</span></font><font size=3D"2" style=3D"font-family:Tahoma"><spa= n style=3D"font-size:10pt"><br> </span></font><font size=3D"2" style=3D"font-family:Tahoma"><span style=3D"= font-size:10pt">PhD student at=C2=A0Dyson School of Design Engineering</spa= n></font><font size=3D"2" style=3D"font-family:Tahoma"><span style=3D"font-= size:10pt"><br> Imperial College London</span></font><font size=3D"2" style=3D"font-family:= Tahoma"><span style=3D"font-size:10pt"><br> <a href=3D"https://maps.google.com/?q=3D10+Princes+Gardens+%0D%0ASouth+Kens= ington,+SW7&entry=3Dgmail&source=3Dg">10 Princes Gardens</a></span>= </font><font size=3D"2" style=3D"font-family:Tahoma"><span style=3D"font-si= ze:10pt"><br> South Kensington, SW7 1NA, London=C2=A0</span></font><font size=3D"2" style= =3D"font-family:Tahoma"><span style=3D"font-size:10pt"><br> E-mail: <a href=3D"mailto:isaac.engel@xxxxxxxx" target=3D"_blank">isa= ac.engel@xxxxxxxx</a></span></font><font size=3D"2" style=3D"font-fam= ily:Tahoma"><span style=3D"font-size:10pt"><br> </span></font><font size=3D"2" style=3D"font-family:Tahoma"><span style=3D"= font-size:10pt"><br> </span></font><a href=3D"http://www.imperial.ac.uk/design-engineering-schoo= l" id=3D"m_671709703960034571LPNoLP" style=3D"font-family:Tahoma;font-size:= 16px" title=3D"http://www.imperial.ac.uk/design-engineering-school Ctrl+Click or tap to follow the link" target=3D"_blank"><font size=3D"2"><s= pan id=3D"m_671709703960034571LPNoLP" style=3D"font-size:10pt"></span></fon= t></a></p> <p><a href=3D"http://www.imperial.ac.uk/design-engineering/research/human-p= erformance-and-experience/sound-and-audio-systems" class=3D"m_6717097039600= 34571OWAAutoLink" id=3D"m_671709703960034571LPlnk65443" target=3D"_blank">w= ww.imperial.ac.uk/design-<wbr>engineering/research/human-<wbr>performance-a= nd-experience/<wbr>sound-and-audio-systems</a><br> </p> </div> </div> <p style=3D"margin-top:0;margin-bottom:0"><br> </p> <div id=3D"m_671709703960034571Signature"> <div id=3D"m_671709703960034571divtagdefaultwrapper" dir=3D"ltr" style=3D"f= ont-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif"= > <p><br> </p> </div> </div> <div style=3D"color:rgb(0,0,0)"> <hr style=3D"display:inline-block;width:98%"> <div id=3D"m_671709703960034571divRplyFwdMsg" dir=3D"ltr"><font face=3D"Cal= ibri, sans-serif" style=3D"font-size:11pt" color=3D"#000000"><b>From:</b> E= ngel Alonso-Martinez, Isaac<br> <b>Sent:</b> 24 February 2018 19:08<br> <b>To:</b> <a href=3D"mailto:auditory@xxxxxxxx" target=3D"_blank">au= ditory@xxxxxxxx</a><br> <b>Subject:</b> Visual references in sound localisation</font> <div>=C2=A0</div> </div><span class=3D""> <div dir=3D"ltr"> <div id=3D"m_671709703960034571x_divtagdefaultwrapper" dir=3D"ltr" style=3D= "font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif,E= mojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColor= Emoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols"> <p style=3D"margin-top:0;margin-bottom:0"></p> <div></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>Dear all,</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>I am interested in the impact of audible visual references in sound lo= calisation tasks.</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>For instance, let's say that you are presented two different conti= nuous sounds (e.g., speech) coming from sources A and B, which are in diffe= rent locations. While source A is clearly visible to you, B is invisible an= d you are asked to estimate its location. Will source A act as a spatial reference, helping you in doing a more accu= rate estimation, or will it be distracting and make the task more difficult= ?</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>If anyone can point to some literature on this, it would be greatly ap= preciated.</div> <div><br> </div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>Kind regards,</div> </span></font></div> <div style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Appl= e Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe U= I Symbol","Android Emoji",EmojiSymbols;font-size:16px"> <font face=3D"Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Sego= e UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" size= =3D"3" color=3D"black"><span style=3D"font-size:12pt"> <div>Isaac Engel</div> </span></font></div> <div></div> <p></p> </div> </div> </span></div> </div> </div> </div> </blockquote></div><br></div> --001a11441d1c0f797505665860b0--