Re: perceptual evaluation of cochlear models (Mark Cartwright )


Subject: Re: perceptual evaluation of cochlear models
From:    Mark Cartwright  <mcartwright@xxxxxxxx>
Date:    Fri, 3 Oct 2014 12:57:31 -0500
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--e89a8ffbaecfbcef3e0504887830 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Josh, John, and Rohr! We'll take a look at these. Cheers, Mark On Thu, Oct 2, 2014 at 11:44 AM, Joshua Reiss <joshua.reiss@xxxxxxxx> wrote: > Hi Mark, > You probably know about this already, but in these two papers we tried to > position sources in a stereo mix by reducing the spatial masking. However= , > we didn't use a formal auditory model of spatial masking. > > E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio > panning system for music mixing, EURASIP Journal on Advances in Signal > Processing, v2010, Article ID 436895, p. 1-10, 2010. > > S. Mansbridge, S. Finn and J. D. Reiss, "An Autonomous System for > Multi-track Stereo Pan Positioning," 133rd AES Convention, San Francisco, > Oct. 26-29, 2012. > > > From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxx= A> > on behalf of Mark Cartwright <mcartwright@xxxxxxxx> > Sent: 02 October 2014 00:36 > To: AUDITORY@xxxxxxxx > Subject: Re: perceptual evaluation of cochlear models > > > Hello auditory list, > > > So, in regards to Francesco's third item (the role of space), what is the > latest research in partial loudness and masking in regards to spatial > position/separation? Does a model exist yet for which an angle of > separation is input and its effect on partial loudness and masking is > output (e.g. either a binary yes/no release from masking or the change to > the masking threshold, etc.)? Or any similar function...? > > > Thanks! > > > Mark > > > On Mon, Sep 8, 2014 at 11:44 AM, ftordini@xxxxxxxx <ftordini@xxxxxxxx> > wrote: > > > Hello Joshua, > Thank you for the great expansion and for the further reading suggestions= . > I may add three more items to the list, hoping to be clear in my > formulation. > > (1) A perhaps provocative question could be: is there a loudness or more > loudnesses? (is loudness domain dependant?) Should we continue to tackle > loudness as an invariant percept across classes once we move onto the mor= e > complex domain of real sounds? Rephrasing: once we define an ecologicall= y > valid taxonomy of real world sounds (e.g. starting from Gaver), can we > expect the loudness model we want to improve to be valid across > (sound)classes? Hard to say, I would attempt 'yes', but granting differen= t > paramenters tuning according to the dominant context (say, speech, music= , > or environmental sounds). [hidden question: are we actually, ever, purely > "naive" listeners?] > > (2) A related question: can we jump form the controlled lab environment > into the wild in a single step? I'd say no - The approach followed by > EBU/ITU using real world, long, stimuli is highly relevant to the > broadcasting world, but it is hard to distinguish between energetic and > informational masking effects using real program material mostly made of > speech and music. Sets of less informative sources taken from > environmental, natural sounds may be a good compromise - a starting point > to address basic expansions of the current loudness model(s). Such > stragegies and datasets are missing (to my knowledge). > > (3) The role of space. Psysiologically driven models (Moore, Patterson) > are supported mostly by observations obtained using non-spatialized, or > dichotic, scenes to better reveal mechanisms sorting out the spatial > confound. However, while spatial cues are considered to play a secondary > role in scene alaysis, spatial release from masking is, on the other hand= , > quite important in partial loudness modeling, at least from the energetic > masking point of view and especially for complex sources. This is even mo= re > relevant for asymmetric sources distributions. I feel there is much to d= o > before we can address this aspect with confidence, even limiting the scop= e > to non-moving sources, but more curiosity with respect to spatial variabl= es > may be valuable when designing listening experiments with natural sounds= . > [If one asks a sound engineer working on a movie soundtrack: "where do yo= u > start form?", he will start talking about panning, to set the scene using > his sources (foley, dialogue, music, ...) and **then** adjust levels/eq .= ..] > > > Best, > -- > Francesco Tordini > http://www.cim.mcgill.ca/sre/personnel/ > http://ca.linkedin.com/in/ftordini > > > > > > >----Messaggio originale---- > >Da: joshua.reiss@xxxxxxxx > >Data: 06/09/2014 13.43 > >A: "ftordini@xxxxxxxx"<ftordini@xxxxxxxx>, "AUDITORY@xxxxxxxx"< > AUDITORY@xxxxxxxx> > >Ogg: RE: RE: perceptual evaluation of cochlear models > > > >Hi Francesco (and auditory list in case others are interested), > >I'm glad to hear that you've been following the intelligent mixing > research. > > > >I'll rephrase your email as a set of related questions... > > > >1. Should we extend the concepts of loudness and partial loudness to > complex material? - Yes, we should. Otherwise, what is it good for? That > is, what does it matter if we can accurately predict perceived loudness o= f > a pure tone, or the just noticeable differences between pedestal > increments for white or pink noise, or the partial loudness of a tone in > the presence of noise, etc., if we can't predict loudness outside > artificial laboratory conditions. I suppose it works as validation of an > auditory model, but its still very limited. > >On the other hand, if we can extend the model to complex sounds like > music, conversations, environmental sounds, etc., then we provide robust > validation a general model of human loudness perception. The model can th= en > be applied to metering systems, audio production, broadcast standards, > improved hearing aid design and so on. > > > >2. Can we extend the concepts of loudness and partial loudness to comple= x > material? - Yes, I think so. Despite all the issues and complexity, there= 's > a tremendous amount of consistency in perception of loudness, especially > when one considers relative rather than absolute perception. Take a TV > show and the associated adverts. The soundtracks of both may have dialogu= e, > foley, ambience, music,..., all with levels over time. Yet consistently > people can identify when the adverts are louder than the show. Same is > true when someone changes radio stations, and in music production, sound > engineers are always identifying and dealing with masking when there are > multiple simultaneous sources. > >I think the issues that many issues relating to complex material may hav= e > a big effect on perception of timbre or extraction of meaning or emotion, > but only a minor effect on loudness. > > > >3. Can we extend current auditory models of loudness and partial loudnes= s > to complex material? - Hard to say. The state of the art in those based o= n > deep understanding of the human hearing system (Glasberg, Moore et al... = ; > Fastl, Zwicker, et al...) were not developed with complex material in > mind, though when used with complex material, researchers have reported > good but far from great agreement with perception. Modification, though > still in agreement with auditory knowledge, shows improvement, but more > research is needed. > >On the other hand, we have models based mostly on listening test data, > but incorporating little auditory knowledge. I'm thinking here of the > EBU/ITU loudness standards. They are based largely on Gilbert Soulodre's > excellent listening test results > >(G. Soulodre, Evaluation of Objective Loudness Meters, 116th AES > Convention, 2004.), and represent a big improvement on say, just applying= a > loudness contour to signal RMS. But they are generally for a fixed > listening level, may overfit the data, difficult to generalise, and rare= ly > give deeper insight into the auditory system. Furthermore, like Moore's > model, these have also shown some inadequacies when dealing with a wider > range of content (Pestana, Reiss & Barbosa, "Loudness Measurement of > Multitrack Audio Content Using Modifications of ITU-R BS.1770," 134th AE= S > Convention, 2013). > >So I think rather than just extend, we may need to modify, improve, and > go back to the drawing board on some aspects. > > > >4. How could one develop an auditory model of loudness and partial > loudness for complex material? > >- Incorporate the validated aspects from prior models, but reassess any > compromises. > >- Use listening test results from a wide range of complex material. > Perhaps a metastudy could be performed, taking listening test results fro= m > many publications for both model creation and validation. > >- Build in known aspects of loudness perception that were left out of > existing models due to resources and the fact that they were built for la= b > scenarios (pure tones, pink noise, sine sweeps...). In particular, I'm > thinking forward and backward masking. > > > >5. What about JND? - I would stay clear of this. I'm not even aware of > anecdotal evidence suggesting consistency in just noticeable differences > for say, a small change in the level of one source in a mix. And I think > one can be trained to identify small partial loudness differences. I've > had conversations with professional mixing engineers who detect a problem > with a mix that I don't notice until they point it out. But the concept o= f > extending JND models to complex material is certainly very interesting. > > > >________________________________________ > >From: ftordini@xxxxxxxx <ftordini@xxxxxxxx> > >Sent: 04 September 2014 15:45 > >To: Joshua Reiss > >Subject: R: RE: perceptual evaluation of cochlear models > > > >Hello Joshua, > >Interesting, indeed. Thank you. > > > >So the question is - to what extent can we stretch the concepts of > loudness > >and partial loudness for complex material such as meaningful noise (aka > music), > >where attention and preference is likely to play a role as opposed to > beeps and > >sweeps ? That is - would you feel comfortable to give a rule of a thumb > for a > >JND for partial loudness, to safely rule out other factors when mixing? > > > >I was following your intelligent mixing thread - although I've missed th= e > >recent one you sent me - and my question above relates to the possibilit= y > to > >actually "design" the fore-background perception when you do automatic > mixing > >using real sounds... > >I would greatly appreciate any comment form your side. > > > >Best wishes, > >Francesco > > > > > >>----Messaggio originale---- > >>Da: joshua.reiss@xxxxxxxx > >>Data: 03/09/2014 16.00 > >>A: "AUDITORY@xxxxxxxx"<AUDITORY@xxxxxxxx>, "Joachim > Thiemann" > ><joachim.thiemann@xxxxxxxx>, "ftordini@xxxxxxxx"<ftordini@xxxxxxxx> > >>Ogg: RE: perceptual evaluation of cochlear models > >> > >>Hi Francesco and Joachim, > >>I collaborated on a paper that involved perceptual evaluation of partia= l > >loudness with real world audio content, where partial loudness is derive= d > from > >the auditory models of Moore, Glasberg et al. It showed that the predict= ed > >loudness of tracks in multitrack musical audio disagrees with perception= , > but > >that minor modifications to a couple of parameters in the model would > result in > >a much closer match to perceptual evaluation results. See > >>Z. Ma, J. D. Reiss and D. Black, "Partial loudness in multitrack > mixing," AES > >53rd International Conference on Semantic Audio in London, UK, January > 27-29, > >2014. > >> > >>And in the following paper, there was some informal evaluation of the > use of > >Glasberg, Moore et al's auditory model for loudness and/or partial > loudness > >could be used to mix multitrack musical audio. Though the emphasis was o= n > >application rather than evaluation, it also noticed issues with the mode= l > when > >applied to real world content. See, > >>D. Ward, J. D. Reiss and C. Athwal, "Multitrack mixing using a model of > >loudness and partial loudness," 133rd AES Convention, San Francisco, Oct= . > 26- > >29, 2012. > >> > >>These may not be exactly what you're looking for, but hopefully you fin= d > it > >interesting. > >>________________________________________ > >>From: AUDITORY - Research in Auditory Perception < > AUDITORY@xxxxxxxx> > >on behalf of Joachim Thiemann <joachim.thiemann@xxxxxxxx> > >>Sent: 03 September 2014 07:12 > >>To: AUDITORY@xxxxxxxx > >>Subject: Re: perceptual evaluation of cochlear models > > > >> > >>Hello Francesco, > >> > >>McGill alumni here - I did a bit of study in this direction, you can > >>read about it in my thesis: > >>http://www-mmsp.ece.mcgill.ca/MMSP/Theses/T2011-2013.html#Thiemann > >> > >>My argument was that if you have a good auditory model, you should be > >>able to start from only the model parameters and be able to > >>reconstruct the original signal with perceptual transparency. I was > >>looking at this in the context of perceptual coding - a perceptual > >>coder minus the entropy stage effectively verifies the model. If > >>artefacts do appear, they can (indirectly) tell you what you are > >>missing. > >> > >>I was specifically looking at gammatone filterbank methods, so there > >>is no comparison to other schemas - but I hope it is a bit in the > >>direction you're looking at. > >> > >>Cheers, > >>Joachim. > >> > >>On 2 September 2014 20:39, ftordini@xxxxxxxx <ftordini@xxxxxxxx> > wrote: > >>> > >>> Dear List members, > >>> I am looking for references on perceptual evaluation of cochlear > models - > >>> taken form an analysis-synthesis point of view, alike the work > introduced > >in > >>> Homann_2002 (Frequency analysis and synthesis using a Gammatone > filterbank, > >>> =C2=A74.3). > >>> > >>> Are you aware of any study that tried to assess the performance of > >>> gammatone-like filterbanks used as a synthesis model? (AKA, what ar= e > the > >>> advantages over MPEG-like schemas?) > >>> > >>> All the best, > >>> Francesco > >>> > >>> http://www.cim.mcgill.ca/sre/personnel/ > >>> http://ca.linkedin.com/in/ftordini > > > --e89a8ffbaecfbcef3e0504887830 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Thanks Josh, John, and Rohr! We&#39;ll take a look at thes= e.<div><br></div><div>Cheers,</div><div><br></div><div>Mark</div></div><div= class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, Oct 2, 2014 a= t 11:44 AM, Joshua Reiss <span dir=3D"ltr">&lt;<a href=3D"mailto:joshua.rei= ss@xxxxxxxx" target=3D"_blank">joshua.reiss@xxxxxxxx</a>&gt;</span> wro= te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-= left:1px #ccc solid;padding-left:1ex">Hi Mark,<br> You probably know about this already, but in these two papers we tried to p= osition sources in a stereo mix by reducing the spatial masking. However, w= e didn&#39;t use a formal auditory model of spatial masking.<br> =C2=A0<br> E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio pannin= g system for music mixing, EURASIP Journal on Advances in Signal Processing= ,=C2=A0 v2010, Article ID 436895, p. 1-10, 2010.<br> =C2=A0<br> =C2=A0S. Mansbridge, S. Finn and J. D. Reiss, &quot;An Autonomous System fo= r Multi-track Stereo Pan Positioning,&quot; 133rd AES Convention, San Franc= isco, Oct. 26-29, 2012.<br> <br> <br> From: AUDITORY - Research in Auditory Perception &lt;<a href=3D"mailto:AUDI= TORY@xxxxxxxx">AUDITORY@xxxxxxxx</a>&gt; on behalf of Mark Ca= rtwright &lt;<a href=3D"mailto:mcartwright@xxxxxxxx">mcartwright@xxxxxxxx= </a>&gt;<br> Sent: 02 October 2014 00:36<br> <span class=3D"im HOEnZb">To: <a href=3D"mailto:AUDITORY@xxxxxxxx">A= UDITORY@xxxxxxxx</a><br> Subject: Re: perceptual evaluation of cochlear models<br> =C2=A0<br> <br> </span><span class=3D"im HOEnZb">Hello auditory list,<br> <br> <br> So, in regards to Francesco&#39;s third item (the role of space), what is t= he latest research in partial loudness and masking in regards to spatial po= sition/separation? Does a model exist yet for which an angle of separation = is input and its effect on partial=C2=A0 loudness and masking is output (e.= g. either a binary yes/no release from masking or the change to the masking= threshold, etc.)? Or any similar function...?<br> <br> <br> Thanks!<br> <br> <br> Mark<br> <br> <br> On Mon, Sep 8, 2014 at 11:44 AM,=C2=A0 <a href=3D"mailto:ftordini@xxxxxxxx= ">ftordini@xxxxxxxx</a> &lt;<a href=3D"mailto:ftordini@xxxxxxxx">ftordini= @xxxxxxxx</a>&gt; wrote:<br> <br> <br> </span><div class=3D"HOEnZb"><div class=3D"h5">Hello Joshua,<br> Thank you for the great expansion and for the further reading suggestions.<= br> I may add three more items to the list, hoping to be clear in my formulatio= n.=C2=A0<br> <br> (1) A perhaps provocative question could be: is there a loudness or more lo= udnesses? (is loudness domain dependant?) Should we continue to tackle loud= ness as an invariant percept across classes once we move onto the more comp= lex domain of real sounds? Rephrasing:=C2=A0 once we define an ecologically= valid taxonomy of real world sounds (e.g. starting from Gaver), can we exp= ect=C2=A0the loudness model we want to improve to be valid across (sound)cl= asses? Hard to say, I would attempt &#39;yes&#39;, but granting different p= aramenters tuning=C2=A0 according to the dominant context (say, speech, mus= ic, or environmental sounds). [hidden question: are we actually, ever, pure= ly &quot;naive&quot; listeners?]<br> <br> (2) A related question: can we jump form the controlled lab environment int= o the wild in a single step?=C2=A0 I&#39;d say no - The approach followed b= y EBU/ITU using real world, long, stimuli is highly relevant to the broadca= sting world, but it is hard to distinguish=C2=A0 between energetic and info= rmational masking effects using real program material mostly made of speech= and music. Sets of less informative sources taken from environmental, natu= ral sounds may be a good compromise - a starting point to address basic exp= ansions=C2=A0 of the current loudness model(s).=C2=A0 Such stragegies and d= atasets are missing (to my knowledge).<br> <br> (3) The role of space. Psysiologically driven models (Moore, Patterson) are= supported mostly by observations obtained using non-spatialized, or dichot= ic, scenes to better reveal mechanisms sorting out the spatial confound. Ho= wever, while spatial cues are considered=C2=A0 to play a secondary role in = scene alaysis, spatial release from masking is, on the other hand, quite im= portant in partial loudness modeling, at least from the energetic masking p= oint of view and especially for complex sources. This is even more relevant= for=C2=A0 asymmetric sources distributions. I feel there is much to do bef= ore we can address this aspect with confidence, even limiting the scope to = non-moving sources, but more curiosity with respect to spatial variables ma= y be valuable when designing listening experiments=C2=A0 with natural sound= s.<br> [If one asks a sound engineer working on a movie soundtrack: &quot;where do= you start form?&quot;, he will start talking about panning, to set the sce= ne using his sources (foley, dialogue, music, ...) and **then** adjust leve= ls/eq ...]<br> <br> =C2=A0<br> Best,<br> --<br> Francesco Tordini<br> <a href=3D"http://www.cim.mcgill.ca/sre/personnel/" target=3D"_blank">http:= //www.cim.mcgill.ca/sre/personnel/</a><br> <a href=3D"http://ca.linkedin.com/in/ftordini" target=3D"_blank">http://ca.= linkedin.com/in/ftordini</a><br> <br> <br> =C2=A0<br> =C2=A0<br> <br> &gt;----Messaggio originale----<br> &gt;Da: <a href=3D"mailto:joshua.reiss@xxxxxxxx">joshua.reiss@xxxxxxxx<= /a><br> &gt;Data: 06/09/2014 13.43<br> &gt;A: &quot;<a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx</a>&q= uot;&lt;<a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx</a>&gt;, &= quot;<a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx</= a>&quot;&lt;<a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx= LL.CA</a>&gt;<br> &gt;Ogg: RE: RE: perceptual evaluation of cochlear models<br> &gt;<br> &gt;Hi Francesco (and auditory list in case others are interested),<br> &gt;I&#39;m glad to hear that you&#39;ve been following the intelligent mix= ing research.<br> &gt;<br> &gt;I&#39;ll rephrase your email as a set of related questions...<br> &gt;<br> &gt;1. Should we extend the concepts of loudness and partial loudness to co= mplex material? - Yes, we should. Otherwise, what is it good for? That is, = what does it matter if we can accurately predict perceived loudness of a pu= re tone, or the just noticeable differences=C2=A0 between pedestal incremen= ts for white or pink noise, or the partial loudness of a tone in the presen= ce of noise, etc., if we can&#39;t predict loudness outside artificial labo= ratory conditions. I suppose it works as validation of an auditory model, b= ut its still=C2=A0 very limited.<br> &gt;On the other hand, if we can extend the model to complex sounds like mu= sic, conversations, environmental sounds, etc., then we provide robust vali= dation a general model of human loudness perception. The model can then be = applied to metering systems, audio=C2=A0 production, broadcast standards, i= mproved hearing aid design and so on.<br> &gt;<br> &gt;2. Can we extend the concepts of loudness and partial loudness to compl= ex material? - Yes, I think so. Despite all the issues and complexity, ther= e&#39;s a tremendous amount of consistency in perception of loudness, espec= ially when one considers relative rather=C2=A0 than absolute perception. Ta= ke a TV show and the associated adverts. The soundtracks of both may have d= ialogue, foley, ambience, music,..., all with levels over time. Yet consist= ently people can identify when the adverts are louder than the show. Same i= s true=C2=A0 when someone changes radio stations, and in music production, = sound engineers are always identifying and dealing with masking when there = are multiple simultaneous sources.<br> &gt;I think the issues that many issues relating to complex material may ha= ve a big effect on perception of timbre or extraction of meaning or emotion= , but only a minor effect on loudness.<br> &gt;<br> &gt;3. Can we extend current auditory models of loudness and partial loudne= ss to complex material? - Hard to say. The state of the art in those based = on deep understanding of the human hearing system (Glasberg, Moore et al...= ; Fastl, Zwicker, et al...) were not=C2=A0 developed with complex material= in mind, though when used with complex material, researchers have reported= good but far from great agreement with perception. Modification, though st= ill in agreement with auditory knowledge, shows improvement, but more resea= rch=C2=A0 is needed.<br> &gt;On the other hand, we have models based mostly on listening test data, = but incorporating little auditory knowledge. I&#39;m thinking here of the E= BU/ITU loudness standards. They are based largely on Gilbert Soulodre&#39;s= excellent listening test results<br> &gt;(G. Soulodre, Evaluation of Objective Loudness Meters, 116th AES Conven= tion, 2004.), and represent a big improvement on say, just applying a loudn= ess contour to signal RMS. But they are generally for a fixed listening lev= el, may overfit the data, difficult=C2=A0 to generalise, and rarely give de= eper insight into the auditory system. Furthermore, like Moore&#39;s model,= these have also shown some inadequacies when dealing with a wider range of= content (Pestana, Reiss &amp; Barbosa, &quot;Loudness Measurement of Multi= track Audio=C2=A0 Content Using Modifications of ITU-R BS.1770,&quot; 134th= AES Convention, 2013).<br> &gt;So I think rather than just extend, we may need to modify, improve, and= go back to the drawing board on some aspects.<br> &gt;<br> &gt;4. How could one develop an auditory model of loudness and partial loud= ness for complex material?<br> &gt;- Incorporate the validated aspects from prior models, but reassess any= compromises.<br> &gt;- Use listening test results from a wide range of complex material. Per= haps a metastudy could be performed, taking listening test results from man= y publications for both model creation and validation.<br> &gt;- Build in known aspects of loudness perception that were left out of e= xisting models due to resources and the fact that they were built for lab s= cenarios (pure tones, pink noise, sine sweeps...). In particular, I&#39;m t= hinking forward and backward masking.<br> &gt;<br> &gt;5. What about JND? - I would stay clear of this. I&#39;m not even aware= of anecdotal evidence suggesting consistency in just noticeable difference= s for say, a small change in the level of one source in a mix. And I think = one can be trained to identify small partial=C2=A0 loudness differences. I&= #39;ve had conversations with professional mixing engineers who detect a pr= oblem with a mix that I don&#39;t notice until they point it out. But the c= oncept of extending JND models to complex material is certainly very intere= sting.<br> &gt;<br> &gt;________________________________________<br> &gt;From: <a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx</a> &lt;= <a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx</a>&gt;<br> &gt;Sent: 04 September 2014 15:45<br> &gt;To: Joshua Reiss<br> &gt;Subject: R: RE: perceptual evaluation of cochlear models<br> &gt;<br> &gt;Hello Joshua,<br> &gt;Interesting, indeed. Thank you.<br> &gt;<br> &gt;So the question is - to what extent can we stretch the concepts of loud= ness<br> &gt;and partial loudness for complex material such as meaningful noise (aka= music),<br> &gt;where attention and preference is likely to play a role as opposed to b= eeps and<br> &gt;sweeps ? That is - would you feel comfortable to give a rule of a thumb= for a<br> &gt;JND for partial loudness, to safely rule out other factors when mixing?= <br> &gt;<br> &gt;I was following your intelligent mixing thread - although I&#39;ve miss= ed the<br> &gt;recent one you sent me - and my question above relates to the possibili= ty to<br> &gt;actually &quot;design&quot; the fore-background perception when you do = automatic mixing<br> &gt;using real sounds...<br> &gt;I would greatly appreciate any comment form your side.<br> &gt;<br> &gt;Best wishes,<br> &gt;Francesco<br> &gt;<br> &gt;<br> &gt;&gt;----Messaggio originale----<br> &gt;&gt;Da: <a href=3D"mailto:joshua.reiss@xxxxxxxx">joshua.reiss@xxxxxxxx= .uk</a><br> &gt;&gt;Data: 03/09/2014 16.00<br> &gt;&gt;A: &quot;<a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx= .MCGILL.CA</a>&quot;&lt;<a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITOR= Y@xxxxxxxx</a>&gt;, &quot;Joachim Thiemann&quot;<br> &gt;&lt;<a href=3D"mailto:joachim.thiemann@xxxxxxxx">joachim.thiemann@xxxxxxxx= L.COM</a>&gt;, &quot;<a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx= it</a>&quot;&lt;<a href=3D"mailto:ftordini@xxxxxxxx">ftordini@xxxxxxxx</a= >&gt;<br> &gt;&gt;Ogg: RE: perceptual evaluation of cochlear models<br> &gt;&gt;<br> &gt;&gt;Hi Francesco and Joachim,<br> &gt;&gt;I collaborated on a paper that involved perceptual evaluation of pa= rtial<br> &gt;loudness with real world audio content, where partial loudness is deriv= ed from<br> &gt;the auditory models of Moore, Glasberg et al. It showed that the predic= ted<br> &gt;loudness of tracks in multitrack musical audio disagrees with perceptio= n, but<br> &gt;that minor modifications to a couple of parameters in the model would r= esult in<br> &gt;a much closer match to perceptual evaluation results. See<br> &gt;&gt;Z. Ma, J. D. Reiss and D. Black, &quot;Partial loudness in multitra= ck mixing,&quot; AES<br> &gt;53rd International Conference on Semantic Audio in London, UK, January = 27-29,<br> &gt;2014.<br> &gt;&gt;<br> &gt;&gt;And in the following paper, there was some informal evaluation of t= he use of<br> &gt;Glasberg, Moore et al&#39;s auditory model for loudness and/or partial = loudness<br> &gt;could be used to mix multitrack musical audio. Though the emphasis was = on<br> &gt;application rather than evaluation, it also noticed issues with the mod= el when<br> &gt;applied to real world content. See,<br> &gt;&gt;D. Ward, J. D. Reiss and C. Athwal, &quot;Multitrack mixing using a= model of<br> &gt;loudness and partial loudness,&quot; 133rd AES Convention, San Francisc= o, Oct. 26-<br> &gt;29, 2012.<br> &gt;&gt;<br> &gt;&gt;These may not be exactly what you&#39;re looking for, but hopefully= you find it<br> &gt;interesting.<br> &gt;&gt;________________________________________<br> &gt;&gt;From: AUDITORY - Research in Auditory Perception &lt;<a href=3D"mai= lto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx</a>&gt;<br> &gt;on behalf of Joachim Thiemann &lt;<a href=3D"mailto:joachim.thiemann@xxxxxxxx= AIL.COM">joachim.thiemann@xxxxxxxx</a>&gt;<br> &gt;&gt;Sent: 03 September 2014 07:12<br> &gt;&gt;To: <a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx= LL.CA</a><br> &gt;&gt;Subject: Re: perceptual evaluation of cochlear models<br> <br> <br> &gt;&gt;<br> &gt;&gt;Hello Francesco,<br> &gt;&gt;<br> &gt;&gt;McGill alumni here - I did a bit of study in this direction, you ca= n<br> &gt;&gt;read about it in my thesis:<br> &gt;&gt;<a href=3D"http://www-mmsp.ece.mcgill.ca/MMSP/Theses/T2011-2013.htm= l#Thiemann" target=3D"_blank">http://www-mmsp.ece.mcgill.ca/MMSP/Theses/T20= 11-2013.html#Thiemann</a><br> &gt;&gt;<br> &gt;&gt;My argument was that if you have a good auditory model, you should = be<br> &gt;&gt;able to start from only the model parameters and be able to<br> &gt;&gt;reconstruct the original signal with perceptual transparency.=C2=A0= I was<br> &gt;&gt;looking at this in the context of perceptual coding - a perceptual<= br> &gt;&gt;coder minus the entropy stage effectively verifies the model.=C2=A0= If<br> &gt;&gt;artefacts do appear, they can (indirectly) tell you what you are<br= > &gt;&gt;missing.<br> &gt;&gt;<br> &gt;&gt;I was specifically looking at gammatone filterbank methods, so ther= e<br> &gt;&gt;is no comparison to other schemas - but I hope it is a bit in the<b= r> &gt;&gt;direction you&#39;re looking at.<br> &gt;&gt;<br> &gt;&gt;Cheers,<br> &gt;&gt;Joachim.<br> &gt;&gt;<br> &gt;&gt;On 2 September 2014 20:39,=C2=A0 <a href=3D"mailto:ftordini@xxxxxxxx= it">ftordini@xxxxxxxx</a> &lt;<a href=3D"mailto:ftordini@xxxxxxxx">ftordi= ni@xxxxxxxx</a>&gt; wrote:<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; Dear List members,<br> &gt;&gt;&gt; I am looking for references on perceptual evaluation of cochle= ar models -<br> &gt;&gt;&gt; taken form an analysis-synthesis point of view, alike the work= introduced<br> &gt;in<br> &gt;&gt;&gt; Homann_2002 (Frequency analysis and synthesis using a Gammaton= e filterbank,<br> &gt;&gt;&gt; =C2=A74.3).<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; Are you aware of any study that tried to assess the performanc= e of<br> &gt;&gt;&gt; gammatone-like filterbanks used as a synthesis model?=C2=A0=C2= =A0 (AKA, what are the<br> &gt;&gt;&gt; advantages over MPEG-like schemas?)<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; All the best,<br> &gt;&gt;&gt; Francesco<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; <a href=3D"http://www.cim.mcgill.ca/sre/personnel/" target=3D"= _blank">http://www.cim.mcgill.ca/sre/personnel/</a><br> &gt;&gt;&gt; <a href=3D"http://ca.linkedin.com/in/ftordini" target=3D"_blan= k">http://ca.linkedin.com/in/ftordini</a><br> <br> =C2=A0 =C2=A0 </div></div></blockquote></div><br></div> --e89a8ffbaecfbcef3e0504887830--


This message came from the mail archive
http://www.auditory.org/postings/2014/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University