[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: frequency to mel formula

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: frequency to mel formula
From: "Richard F. Lyon" <DickLyon@xxxxxxx>
Date: Wed, 22 Jul 2009 23:43:47 -0700
Approved-by: DickLyon@xxxxxxx
Delivery-date: Thu Jul 23 03:27:08 2009
In-reply-to: <20090717153237.8A6EE4472@xxxxxxxxxxxxxxxxxxxxxx>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <20090717153237.8A6EE4472@xxxxxxxxxxxxxxxxxxxxxx>
Reply-to: "Richard F. Lyon" <DickLyon@xxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

I'd still like to understand more of the history of the Mel scale,formulas for it, and its relationship to other scales; didO'Shaughnessy come up with the 700? Or did he get it from somewhereelse? Someone figured the 1000 was just too high to be realistic?

I've been reviewing some of Don Greenwood's papers, and the wikipediaarticle on his "Greenwood function" athttp://en.wikipedia.org/wiki/Greenwood_function . And Don's commentsfrom last Jan: http://www.auditory.org/postings/2009/53.html

Don says a good map of cochlear position x (from 0 at apex to 1 atbase) to frequency f in hertz is f = 165.4*(10^(2.1x) - 1). Solvingfor x and scaling to get 1000 at f = 1000, we get a formula in theform of the mel-scale formula:


  m = 512.18 * ln(f/165.4 + 1).

The key here is not the scale factor, but the "break frequency",165.4 Hz, that separates the log-like high-frequency region from thelinear-like low-frequency region. Don finds that the data imply amuch lower break frequency than has traditionally been used; hispapers show that the higher values (700 or 1000) are too high to fitthe published data that they're supposed to be based on. That meansthe map is logarithmic over a wider range than usually recognized,and that the critical bands at the low end are much narrower thansome scales would imply.

The ERB-rate scale based on Glasberg and Moore 1990 has acorresponding break point at 228.8 Hz, much closer to Greenberg'sinterpretation than to the mel-scale interpretations (this is fromERB = 24.7 (4.37F/1000 + 1), where 228.8 is 1000/4.37). In terms ofmel-like formula:


  m = 594.9 * ln(f/228.8 + 1)

This is also very close to what I've been using in recent cochlearmodels for machine hearing (used by Malcolm Slaney in the 1993auditory toolbox; actually I'm using 245 Hz now for some reason Idon't recall). So I guess it's time to take Don seriously at hissuggestion to see if such a change away from mel scale and closer toreality would improve a speech system (vocoder or recognizer). ButI'm not in that business, so I'll have to bend some ears...toward amore logarithmic scale.

Of course, with this relatively small deviation from logarithmic,there's also not a lot of deviation from bandwidth being a "constantQ" function of center frequency, so other simple parameterizationsare likely to fit as well. The Bark scale is an example of such athing, and there are others; the Bark scale is closer to mel than tothe Greenwood or ERB-rate scales.

If you want to look at the mappings, they are compared athttp://www.speech.kth.se/~giampi/auditoryscales/ ; but thenormalization isn't at 1000 Hz, so it's hard to compare shapes, andthey're not on a log frequency scale, so it's hard to see thepredominantly log-like nature of the mappings. So I took andmodified the code from there, added Greenwood, and you can run it ifyou have matlab or octave handy. It's clear that the Greenwood andERB-rate scales have a long "straight" log segment, and that the meland Bark scales break at too high a frequency.


f = 1000;
erb_1k = 214 * log10(1 + f/228.8);
bark_1k = 13*atan(0.00076*f)+3.5*atan((f/7000).^2);

f = (10:10:20000)';
erb = 214 * log10(1 + f/228.8);  % very close to lyon w 245 Hz break
mel = 1127 * log(1 + f/700);
bark = 13*atan(0.00076*f) + 3.5*atan((f/7000).^2);
greenwood = 512.18 * log(1 + f/165.4);

semilogx(f, [1000*erb/erb_1k, mel, 1000*bark/bark_1k, greenwood])
legend('ERB', 'Mel', 'Bark', 'Greenwood', 'Location', 'SouthEast')
xlabel('frequency (Hz)')
ylabel('normalized scales')

Other things I found online include a study that evaluated differentpitch scales on a speech intonation application:http://www.ling.cam.ac.uk/francis/Nolan%20Semitones.pdf Here the logmapping (semitone scale) came out best, with ERB-rate not far behind(and presumably Greenwood's would have been better than ERB-rate,being a little closer to log). Mel and Bark were not much betterthan linear; on this task, the frequency range of interest includedjust voice pitch range, up to 500 Hz, where these latter scales areessentially just linear. It's not clear if this "repetition pitch"task is very closely related to the "frequency" scaling that thescales are designed to cover, but it's a step.

Here's one:http://recherche.ircam.fr/equipes/analyse-synthese/burred/pdf/burred_AES121.pdfthat concludes that Mel, ERB, and Bark are all significantly betterthan either constant-Q (log) or linear scales, for source separationof stereo mixtures. But the results are about the same for the three"auditory" scales.


Here's an ASR study that found no consistent best among ERB, Mel, and Bark:
ftp://cs.joensuu.fi/pub/PhLic/2004_PhLic_Kinnunen_Tomi.pdf

Any other good comparisons?

Dick

Prev by Date: Webcast of some sessions/concerts of SMC 2009
Next by Date: Lecturer in Sound and Music Processing
Previous by thread: frequency to mel formula
Next by thread: Re: frequency to mel formula
Index(es):
- Date
- Thread