[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: frequency to mel formula



Better late than never.  The O'Shaughnessy was 1987, not 1978.

There may yet by others, e.g. in the Japanese literature, that I can't find from the comfort of the Goog.

Dick

At 9:13 PM -0600 3/7/11, James W. Beauchamp wrote:
Good sleuthing, Dick! You've apparently answered the question I
asked in July, 2009 about where the mel scale formula given on
Wikipedia, namely

mel = log(1 + fr/700)*1127,

came from. It was credited to Douglas O'Shaughnessy's 1978 book,
but since O'Shaughnessy didn't remember where he got it, while
at the same time his book seemed to be the primary reference,
it was a mystery as to who the original author of the formula was.

I needed the reference in conjunction with a paper on using MFCC's
for analyzing musical sounds using on some existing software that
did not give references for its formulas. There was also a rumor
that the mel-frequency formula was originally used for MFCC
analysis of speech by some speech researchers in Japan. So it's
good to have a solid reference, but it's probably too late for the
paper we were working on, where O'Shaughnessy's book was the only
thing we had to go by.

Jim

Original message:
From: "Richard F. Lyon" <DickLyon@xxxxxxx>
Date: Mon, 7 Mar 2011 11:59:37 -0800
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: [AUDITORY] frequency to mel formula

I've done a bit more looking for where these guys got their formulae:

At 3:55 PM -0400 7/15/09, Dan Ellis wrote:
I think Fant is the more appropriate reference (for log(1+f/1000)) and
O'Shaugnessy for log(1+f/700).

The "700" version appears in a couple of papers
before O'Shaughnessy's book, and he tells me that
got it from some place that he can't recall, but
definitely did not make it up himself.  Here are
the two that I've found:

Ananthapadmanabha, T. V. (1980) "Formant ratios
on mel scale for male/female and male/child
speakers", Acoustics Letters, UK, vol 4.

and

John Makhoul and Lynn Cosell (1976) "LPCW: An LPC
Vocoder with Linear Predictive Spectral Warping"
ICASSP'76 466-469.

It appears that Makhoul may have made it up to
fit (and Doug says he was at that ICASSP, so that
may be where he got it). John Makhoul says in
 >that paper:

This relation is similar to those of critical
band masking effects and equal intelligibility
curves [8]. The mel-frequency relation can be
approximated by the following equation

   m = 2595 log10(1 + f/700)

where f is the frequency in Hz and m is the
pitch in mels. The mel scale is adjusted such
that m=1000 mels corresponds to f10OO Hz.

In response to my inquiry, John reviewed his notes and said:

... In my notes, I have pasted a copy of Fig. 48 from The Speech
Chain by Peter Denes, which shows the plot of mel scale versus
frequency.  I remember distinctly reading off the mel values from that
plot because Denes did not include a table of values in his book.  I
also remember that earlier formulas divided f by 1000, after Fant.  I
have the equation with f divided by 700 in my notebook, along with a
hand-drawn plot of the mel scale versus frequency (with values taken
from the Denes plot) and the comment: "This equation is almost a
perfect fit to the above curve."  I frankly do not remember if I came
up with the 700 number or I got it from somewhere else.  But, if I had
gotten it from somewhere else, why didn't I reference that work?
After all, I reference other things related to it.  My guess is that I
must have tried a few values and found 700 to give the best visual
fit.

(one may wonder why at Bolt, Beranek, and Newman,
he had to read values off a plot instead of using
Beranek's 1949 table, but that's how life was
before Google)

This makes perfect sense, since the 700 fits
better than the 1000, even for the tabulated data
from Berakek that Fant lists in his 1959 paper,
which is the likely source of Denes's plot.  The
 >1000 Hz fits better if the domain is restricted
to 4 kHz on the high end, but 700 Hz fits better
overall if the full range is considered, as was
illustrated in the plot that I sent around before:
http://dicklyon.com/tech/Hearing/Mel-like_scales.svg

The Denes plot can be seen here (later edition, presumably same plot):
http://books.google.com/books?id=ZMTm3nlDfroC&pg=PA104
It only goes up to 10,000 Hz, which is 3000 mel.
Makhoul's 700 Hz formula goes right through that
point.

As for Fant and the 1000 Hz version, he cites his
own 1949 paper in Swedish, saying "This formula,
discussed in more detail earlier (Fant, 1949), is
a better approximation than the Koenig scale..."
This line is found in his 1959 paper, which is
what's reprinted in the usually cited 1973 book:

G. Fant, "Acoustic description and classification
of phonetic units", Ericsson Technics, No. 1, 1959
reprinted in G. Fant, Speech Sounds and Features,
MIT Press, Cambridge, MA, 1973, pp. 32-83

Dan Ellis had pointed out the 1973 book's
reference to the 1949 Swedish source, and had
pointed out Davis & Mermelstein (1980)'s
reference to Fant's 1959 English paper, but seems
to have missed, as I had, the fact that the 1973
book chapter was a reprint of that 1959 paper.
Steve Greenberg just pointed that out to me this
weekend.  It's easy to miss the small note at the
bottom of the first page of the book chapter.

There's also a mel formula with 625 Hz offset
(expressed as the reciprocal, 1.6e-3 s), a good
fit to the full 14000 Hz of Beranek's data table,
in Lindsay and Norman, 1977:

  "mels = 2410 log (1.6x10^{-3} f + 1)"

Human Information Processing: An Introduction to Psychology
Peter H. Lindsay and Donald A. Norman
Edition	2
Academic Press, 1977
http://books.google.com/books?id=6d9OAAAAMAAJ&q=%22mels+2410+log%22
(I haven't checked first edition)

the same is also in
Sensation and Perception
Stanley Coren, Clare Porac, and Lawrence M. Ward
Academic Press, 1979
http://books.google.com/books?id=AN9qAAAAMAAJ&q=%22mels+2410+log%22

This suggests that Makhoul's 700 Hz was not yet
widely known and used in the late 1970s (not
surprisingly, as it was only in an ICASSP paper,
not likely noticed in the psychology field), and
that others were fitting similar values, finding
Fant's 1000 Hz unsatisfactory, perhaps.  I'll Don
 >Norman what he recalls.

I'd still like to see that 1949 paper (or perhaps
it's a book, at 139 pages).  I've asked the
National Library of Sweden, who have a copy, if
they can make me a copy, but it seems unlikely.
Anyone in Stockholm want to go take a look?

In the mean time, it seems reasonable to cite
Fant 1959 or 1949 as the source of the first mel
formula, and Makhoul 1976 as the source of the
modern 700 Hz version.  I'll update wikipedia.

Of course, this is just for the formulas.  As
Jont Allen points out, Fletcher and Munson had
plots of all this in 1937 in JASA, but didn't
name it like Stevens did.  And as Don Greenwood
points out, the data are all seriously flawed,
and a better formula is one with an offset of 165
Hz (for a human cochlea map), or as Glasberg and
Moore point out, 228 Hz for an ERB-rate scale.

Dick