"Spectral holes" not "Birdies"? (Danijel Domazet )


Subject: "Spectral holes" not "Birdies"?
From:    Danijel Domazet  <Danijel.Domazet@xxxxxxxx>
Date:    Fri, 17 Mar 2006 08:43:00 +0100

This is a multi-part message in MIME format. ------=_NextPart_000_0050_01C6499E.CBF0E3A0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: - "Birdies" Hi Maxime, What you are talking about here is known as "spectral holes" (I think).=20 Spectral holes occur in modern audio encoders as a result of = (over)quantization which sets all spectral components of some = (scalefactor) bands to zero. This is directly related to the masking = threashold calculated by the psychoacoustic model, so the threashold = should be modified in bands that must not be zeroed.=20 Avoiding spectral holes is one of the top problems in designing good = psychoacoustic models.=20 One of the methods for spectral hole avoidance is presented in 3GPP's = AAC encoder specification: TS 26.403 (freely available at www.3gpp.org). = Regards, Daniel ----- Original Message -----=20 From: Maxime Leroy=20 To: AUDITORY@xxxxxxxx=20 Sent: Wednesday, March 15, 2006 8:34 PM Subject: [AUDITORY] RE : - "Birdies" Thank you Bob for you comments, "But simple early schemes had interactions between the input signal and the sample frequency that caused "birdies" at sum and/or difference frequencies." I realise now "birdies" might not be exactly what i meant.=20 I will rephrase then, if you ever looked closely at the spectrogram of = a sample of music encoded at low bit-rate (20-64Kbps)by either mp3 or = AAC codecs, you might have noticed dark spots in some places where it is = obvious the energy of the signal is not suppose to be so small. I = suppose that artifact is due to the richness of the signal at this = precise moment (in comparison with the bit-rate) and therefore bit = allocation can not cope with the demand. Then the coder being unable to = encode leaves a hole in the spectrogram. If i'm correct with the above assumption, what i'd like to know is if = there is any documentation or perceptual intepretation of this problem = of coding. Regards, Maxime =20 -------------------------------------------------------------------------= ----- De: AUDITORY Research in Auditory Perception de la part de Bob Masta Date: mer. 15/03/2006 14:30 =C0: AUDITORY@xxxxxxxx Objet : Re: - "Birdies" Hi, Maxime. I'm not sure exactly what you are looking for, and I = don't have any references to provide. But if you are looking for a = perceptual description, here's what I know: "Birdies" are little whistling sounds that are related to the program material, but are not harmonics of it. They used to be a serious problem in sigma-delta converters, which compare the input signal to a reconstruction of the output signal, and generate a "higher than" or "lower than" response on each sample. That 1-bit stream is then used to create the reconstruction for the comparison (and the eventual output). Nowadays, this is all done at very high sample rates and then ultimately converted down to a nominal rate, and the reconstruction processing is very sophisticated. But simple early schemes had interactions between the input signal and the sample frequency that caused "birdies" at sum and/or difference frequencies. The birdies might be only 40 dB down, but even if they were much softer than that they were clearly audible, especially on sparse program material like simple sine waves, flutes, etc, since they appeared in non-harmonic locations and were not masked by the program itself. They also often had the annoying habit of sweeping in the opposite direction to a sweep in the signal frequency, which made them really obvious. Hope that helps! Best regards, Bob Masta ------=_NextPart_000_0050_01C6499E.CBF0E3A0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD><TITLE>Re: - "Birdies"</TITLE> <META http-equiv=3DContent-Type content=3D"text/html; = charset=3Diso-8859-1"> <META content=3D"MSHTML 6.00.2800.1528" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#c5dbbf> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2>Hi Maxime,</FONT></DIV> <DIV><FONT face=3DArial size=3D2>What you are talking about here is = known as=20 "spectral holes" (I think). </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2>Spectral holes occur in = modern&nbsp;audio encoders=20 as a result of (over)quantization which sets all spectral components=20 of&nbsp;some (scalefactor) bands to zero.&nbsp;This is directly related = to the=20 masking threashold calculated by the psychoacoustic model, so the = threashold=20 should be modified in bands that must not be zeroed. </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2>Avoiding spectral holes is one of the = top problems=20 in designing good psychoacoustic models. </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2>One of the methods for&nbsp;spectral = hole=20 avoidance&nbsp;is&nbsp;presented in 3GPP's AAC encoder specification: TS = 26.403=20 (freely available at <A href=3D"http://www.3gpp.org">www.3gpp.org</A>).=20 </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2>Regards,</FONT></DIV> <DIV><FONT face=3DArial size=3D2>Daniel</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <BLOCKQUOTE dir=3Dltr=20 style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; = BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"> <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV> <DIV=20 style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: = black"><B>From:</B>=20 <A title=3Dm.leroy@xxxxxxxx = href=3D"mailto:m.leroy@xxxxxxxx">Maxime Leroy</A>=20 </DIV> <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A = title=3DAUDITORY@xxxxxxxx=20 href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx</A> = </DIV> <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, March 15, 2006 = 8:34=20 PM</DIV> <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [AUDITORY] RE&nbsp;: - = "Birdies"</DIV> <DIV><BR></DIV> <DIV id=3DidOWAReplyText25025 dir=3Dltr> <DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>Thank you = Bob for you=20 comments,</FONT></DIV></DIV> <DIV> <DIV id=3DidOWAReplyText92379 dir=3Dltr> <DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2><EM>"But simple early = schemes had=20 interactions<BR>between the input signal and the sample frequency that = caused<BR>"birdies" at sum and/or difference = frequencies."</EM></FONT></DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2>I realise now "birdies" = might not be=20 exactly what i meant. </FONT></DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2>I will rephrase then, if = you ever looked=20 closely at the spectrogram of a sample of music encoded at low=20 bit-rate&nbsp;(20-64Kbps)by either mp3 or AAC codecs,&nbsp;you might = have=20 noticed dark spots in some places where it is obvious the = energy&nbsp;of the=20 signal is not suppose to be so small. I suppose that&nbsp;artifact is = due to=20 the richness of the signal&nbsp;at this precise moment (in comparison = with the=20 bit-rate) and therefore bit allocation can not cope with the demand. = Then the=20 coder being unable to encode leaves a hole in the = spectrogram.</FONT></DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2>If i'm correct with the = above assumption,=20 what i'd like to know&nbsp;is if there is&nbsp;any documentation or = perceptual=20 intepretation of this problem of coding.</FONT></DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2>Regards,</FONT></DIV> <DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></DIV> <DIV dir=3Dltr>Maxime</DIV> <DIV dir=3Dltr><BR>&nbsp;</DIV> <DIV dir=3Dltr> <HR tabIndex=3D-1> </DIV> <DIV dir=3Dltr><FONT face=3DTahoma size=3D2><B>De:</B> AUDITORY = Research in Auditory=20 Perception de la part de Bob Masta<BR><B>Date:</B> mer. 15/03/2006=20 14:30<BR><B>=C0:</B> AUDITORY@xxxxxxxx<BR><B>Objet :</B> Re: -=20 "Birdies"<BR></FONT><BR></DIV> <DIV> <P><FONT size=3D2>Hi, Maxime.&nbsp; I'm not sure exactly what you are = looking=20 for, and I don't<BR>have any references to provide.&nbsp; But if you = are=20 looking for a perceptual<BR>description, here's what I = know:<BR><BR>"Birdies"=20 are little whistling sounds that are related to the<BR>program = material, but=20 are not harmonics of it.&nbsp; They used to<BR>be a serious problem in = sigma-delta converters, which compare the<BR>input signal to a = reconstruction=20 of the output signal, and generate<BR>a "higher than" or "lower than" = response=20 on each sample.&nbsp; That<BR>1-bit stream is then used to create the=20 reconstruction for the<BR>comparison (and the eventual output).&nbsp;=20 Nowadays, this is all<BR>done at very high sample rates and then = ultimately=20 converted<BR>down to a nominal rate, and the reconstruction processing = is<BR>very sophisticated.&nbsp; But simple early schemes had=20 interactions<BR>between the input signal and the sample frequency that = caused<BR>"birdies" at sum and/or difference frequencies.&nbsp; The = birdies=20 might<BR>be only 40 dB down, but even if they were much softer than=20 that<BR>they were clearly audible, especially on sparse program=20 material<BR>like simple sine waves, flutes, etc, since they appeared=20 in<BR>non-harmonic locations and were not masked by the=20 program<BR>itself.&nbsp; They also often had the annoying habit of = sweeping in=20 the<BR>opposite direction to a sweep in the signal frequency, which=20 made<BR>them really obvious.<BR><BR><BR>Hope that helps!<BR><BR>Best=20 regards,<BR><BR>Bob = Masta<BR></FONT></P></DIV></DIV></BLOCKQUOTE></BODY></HTML> ------=_NextPart_000_0050_01C6499E.CBF0E3A0--


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University