Subject: Re: [AUDITORY] Fwd: [AUDITORY] MUSHRA test with open reference; and then without open reference From: #ARIJIT BISWAS# <000003292f44871c-dmarc-request@xxxxxxxx> Date: Wed, 21 Aug 2024 14:22:52 +0000--_000_SEZPR01MB4679F9928E94B12C5C83AA20908E2SEZPR01MB4679apcp_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Hannes: Thank you for bringing ITU-R BS.2132-0 to my attention=97I wasn't aware of = it! I also appreciate you sharing the observations from your experiments. Best regards, Arijit ________________________________ From: AUDITORY - Research in Auditory Perception on behalf of Hannes Helmho= lz Sent: Tuesday, August 20, 2024 3:43 PM To: AUDITORY@xxxxxxxx Subject: Re: Fwd: [AUDITORY] MUSHRA test with open reference; and then with= out open reference This is an interesting subject. There is a follow-up recommendation that could apply to your context. There is no catchy abbreviation in the recommendation, but its developers (Fraunhofer IDMT) endorse the term MuSCR (multi-stimulus category rating, in the sense of different categories, i.e., perceptual attributes, being evaluated for several conditions on separate MUSHRA-like pages/trials). ITU-R BS.2132-0, =93Method for the Subjective Quality Assessment of Audible Differences of Sound Systems using Multiple Stimuli without a Given Reference.=94 International Telecommunications Union, pp. 1=9618, 201= 9. I'm unaware of studies directly comparing the results from MUSHRA and MuSCR. We exposed subjects to several conditions without a reference (although hidden high and low "anchor" conditions were included). They were asked to evaluate head-tracked binaural reproductions' "overall quality" from various microphone arrays. I consider this a relatively complex and open task since subjects must establish their internal reference on each trial/page of conditions. We observed trends similar to those in the existing literature that employed MUSHRA in comparable studies. However, there were more considerable variations and inconsistencies than we have seen in former MUSHRA experiments. I would say this is not surprising, considering the nature of the task. ITU-R BS.2132-0 recommends using expert listeners as a countermeasure to large variances, which we could not reasonably implement. An alternative idea would be employing more subjects and excluding listeners based on some quantifiable consistency measure (although this would have to be evaluated and justified very carefully). Best wishes, /Hannes PhD Student, Chalmers University of Technology, Gothenburg, Sweden On 2024-08-20 11:18, Raul Sanchez-Lopez wrote: > Dear Arijit, > > That's an excellent point. As of yet, I haven't come across any studies > directly investigating that specific comparison. However, the > development of MUSHRA likely included such an evaluation. The method is > defined in ITU-R Recommendation BS.1534-3. The introduction states: > > "This Recommendation describes a method for the subjective assessment of > intermediate audio quality. This method reflects many aspects of > Recommendation ITU-R BS.1116 and incorporates the same grading scale > used for picture quality evaluation (i.e. Recommendation ITU-R BT.500)." > > While BS.1116 features a hidden reference, it lacks a hidden anchor. To > assess intermediate audio quality, a new method was developed (more > details here: https://secure.aes.org/forum/pubs/conferences/?elib=3D8056)= . > > Crucially, analyzing how the panel utilizes the scale requires both the > Reference and the Anchor for meaningful difference evaluations. If > there's no reference, it's simply not MUSHRA but a different type of > assessment. In MUSHRA, participants perform multi-comparisons and place > their ratings between the reference and anchor. Without a reference, > they need to establish one first. This increases the risk of > inconsistency across repetitions, potentially leading to noisy data. > > Would you be willing to share some more details about your experiment? > I've dealt with similar questions in the past and may be able to offer > assistance or point you towards someone more experienced, like Force > Technology, for further guidance. > > Best wishes, > > --- > Raul Sanchez-Lopez > Hearing scientist | Audio Engineer | Technical Audiologist > Institute of Globally Distributed Open Research and Education > > On 2024-08-20 10:52, Raul Sanchez-Lopez wrote: >> ---------- Forwarded message --------- >> From: #ARIJIT BISWAS# <000003292f44871c-dmarc-request@xxxxxxxx> >> Date: Tue, 20 Aug 2024 at 09:14 >> Subject: [AUDITORY] MUSHRA test with open reference; and then without >> open reference >> To: <AUDITORY@xxxxxxxx> >> >> Dear all: >> >> Is there any research on how subjective ratings in a MUSHRA test >> might be affected if the same systems (i.e., hidden reference, >> anchors, codecs under test) are re-evaluated without the ability to >> compare them against an open reference? >> >> If no such studies/papers exist, can we make any educated guesses >> about the potential outcomes? >> I imagine that, in the absence of a reference, the task for the >> subjects would become more difficult. >> Any other guesses? >> >> Thank you. >> >> Best regards, >> Arijit >> >> **Disclaimer** The sender of this email does not represent Nanyang >> Technological University and this email does not express the views or >> opinions of the University. **Disclaimer** The sender of this email does not represent Nanyang Technolo= gical University and this email does not express the views or opinions of t= he University. --_000_SEZPR01MB4679F9928E94B12C5C83AA20908E2SEZPR01MB4679apcp_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo= ttom:0;} </style> </head> <body dir=3D"ltr"> <div class=3D"elementToProof" style=3D"font-family: Verdana, Geneva, sans-s= erif; font-size: 10pt; color: rgb(0, 0, 0);"> Hi Hannes:</div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> Thank you for bringing ITU-R BS.2132-0 to my attention=97I wasn't aware of = it!</div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> I also appreciate you sharing the observations from your experiments.</div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; co= lor: rgb(0, 0, 0);"> Best regards,</div> <div class=3D"elementToProof" style=3D"font-family: Verdana, Geneva, sans-s= erif; font-size: 10pt; color: rgb(0, 0, 0);"> Arijit</div> <div class=3D"elementToProof" style=3D"font-family: Verdana, Geneva, sans-s= erif; font-size: 11pt; color: rgb(0, 0, 0);"> <br> </div> <div id=3D"Signature"> <div align=3D"left" style=3D"margin: 0cm 0cm 0pt; font-family: Verdana; fon= t-size: 7.5pt; color: gray;"> <br> </div> <div align=3D"left" style=3D"margin: 0cm 0cm 0pt;"></div> </div> <div id=3D"appendonsend"></div> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt; color: rgb(0, 0, 0);"> <br> </div> <hr style=3D"display: inline-block; width: 98%;"> <span style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-siz= e: 12pt; color: rgb(0, 0, 0);"><b>From:</b> AUDITORY - Research in Aud= itory Perception on behalf of Hannes Helmholz<br> <b>Sent:</b> Tuesday, August 20, 2024 3:43 PM<br> <b>To:</b> AUDITORY@xxxxxxxx<br> <b>Subject:</b> Re: Fwd: [AUDITORY] MUSHRA test with open reference; a= nd then without open reference </span> <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size= : 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-size: 11pt;">This is an interesting subject. There is a = follow-up recommendation that<br> could apply to your context. There is no catchy abbreviation in the<br> recommendation, but its developers (Fraunhofer IDMT) endorse the term<br> MuSCR (multi-stimulus category rating, in the sense of different<br> categories, i.e., perceptual attributes, being evaluated for several<= br> conditions on separate MUSHRA-like pages/trials).<br> <br> ITU-R BS.2132-0, =93Method for the Subjective Quality Assessment of<br> Audible Differences of Sound Systems using Multiple Stimuli without a<br> Given Reference.=94 International Telecommunications Union, pp. 1=9618, 201= 9.<br> <br> I'm unaware of studies directly comparing the results from MUSHRA and<br> MuSCR. We exposed subjects to several conditions without a reference<br> (although hidden high and low "anchor" conditions were included).= They<br> were asked to evaluate head-tracked binaural reproductions' "overall<b= r> quality" from various microphone arrays. I consider this a relatively<= br> complex and open task since subjects must establish their internal<br> reference on each trial/page of conditions.<br> We observed trends similar to those in the existing literature that<br> employed MUSHRA in comparable studies. However, there were more<br> considerable variations and inconsistencies than we have seen in former<br> MUSHRA experiments. I would say this is not surprising, considering the<br> nature of the task.<br> <br> ITU-R BS.2132-0 recommends using expert listeners as a countermeasure to<br= > large variances, which we could not reasonably implement. An alternative<br= > idea would be employing more subjects and excluding listeners based on<br> some quantifiable consistency measure (although this would have to be<br> evaluated and justified very carefully).<br> <br> Best wishes,<br> /Hannes<br> PhD Student, Chalmers University of Technology, Gothenburg, Sweden<br> <br> On 2024-08-20 11:18, Raul Sanchez-Lopez wrote:<br> > Dear Arijit,<br> ><br> > That's an excellent point. As of yet, I haven't come across any studie= s<br> > directly investigating that specific comparison. However, the<br> > development of MUSHRA likely included such an evaluation. The method i= s<br> > defined in ITU-R Recommendation BS.1534-3. The introduction states:<br= > ><br> > "This Recommendation describes a method for the subjective assess= ment of<br> > intermediate audio quality. This method reflects many aspects of<br> > Recommendation ITU-R BS.1116 and incorporates the same grading scale<b= r> > used for picture quality evaluation (i.e. Recommendation ITU-R BT.500)= ."<br> ><br> > While BS.1116 features a hidden reference, it lacks a hidden anchor. T= o<br> > assess intermediate audio quality, a new method was developed (more<br= > > details here: <a href=3D"https://secure.aes.org/forum/pubs/conferences= /?elib=3D8056)" id=3D"OWA6298ecf2-4c4a-b761-dbf1-feba271a26a5" class=3D"OWA= AutoLink" data-auth=3D"NotApplicable"> https://secure.aes.org/forum/pubs/conferences/?elib=3D8056)</a>.<br> ><br> > Crucially, analyzing how the panel utilizes the scale requires both th= e<br> > Reference and the Anchor for meaningful difference evaluations. If<br> > there's no reference, it's simply not MUSHRA but a different type of<b= r> > assessment. In MUSHRA, participants perform multi-comparisons and plac= e<br> > their ratings between the reference and anchor. Without a reference,<b= r> > they need to establish one first. This increases the risk of<br> > inconsistency across repetitions, potentially leading to noisy data.<b= r> ><br> > Would you be willing to share some more details about your experiment?= <br> > I've dealt with similar questions in the past and may be able to offer= <br> > assistance or point you towards someone more experienced, like Force<b= r> > Technology, for further guidance.<br> ><br> > Best wishes,<br> ><br> > ---<br> > Raul Sanchez-Lopez<br> > Hearing scientist | Audio Engineer | Technical Audiologist<br> > Institute of Globally Distributed Open Research and Education<br> ><br> > On 2024-08-20 10:52, Raul Sanchez-Lopez wrote:<br> >> ---------- Forwarded message ---------<br> >> From: #ARIJIT BISWAS# <000003292f44871c-dmarc-request@xxxxxxxx= ill.ca><br> >> Date: Tue, 20 Aug 2024 at 09:14<br> >> Subject: [AUDITORY] MUSHRA test with open reference; and then with= out<br> >> open reference<br> >> To: <AUDITORY@xxxxxxxx><br> >><br> >> Dear all:<br> >><br> >> Is there any research on how subjective ratings in a MUSHRA = test<br> >> might be affected if the same systems (i.e., hidden reference,<br> >> anchors, codecs under test) are re-evaluated without the ability t= o<br> >> compare them against an open reference?<br> >><br> >> If no such studies/papers exist, can we make any educated gu= esses<br> >> about the potential outcomes?<br> >> I imagine that, in the absence of a reference, the task for = the<br> >> subjects would become more difficult.<br> >> Any other guesses?<br> >><br> >> Thank you.<br> >><br> >> Best regards,<br> >> Arijit<br> >><br> >> **Disclaimer** The sender of this email does not represent N= anyang<br> >> Technological University and this email does not express the views= or<br> >> opinions of the University.<br> <br> </div> **Disclaimer** The sender of this email does not represent Nanyang Technolo= gical University and this email does not express the views or opinions of t= he University. </body> </html> --_000_SEZPR01MB4679F9928E94B12C5C83AA20908E2SEZPR01MB4679apcp_--