Re: frequency to mel formula ("Richard F. Lyon" )


Subject: Re: frequency to mel formula
From:    "Richard F. Lyon"  <DickLyon@xxxxxxxx>
Date:    Mon, 7 Mar 2011 20:04:35 -0800
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Better late than never. The O'Shaughnessy was 1987, not 1978. There may yet by others, e.g. in the Japanese literature, that I can't find from the comfort of the Goog. Dick At 9:13 PM -0600 3/7/11, James W. Beauchamp wrote: >Good sleuthing, Dick! You've apparently answered the question I >asked in July, 2009 about where the mel scale formula given on >Wikipedia, namely > >mel = log(1 + fr/700)*1127, > >came from. It was credited to Douglas O'Shaughnessy's 1978 book, >but since O'Shaughnessy didn't remember where he got it, while >at the same time his book seemed to be the primary reference, >it was a mystery as to who the original author of the formula was. > >I needed the reference in conjunction with a paper on using MFCC's >for analyzing musical sounds using on some existing software that >did not give references for its formulas. There was also a rumor >that the mel-frequency formula was originally used for MFCC >analysis of speech by some speech researchers in Japan. So it's >good to have a solid reference, but it's probably too late for the >paper we were working on, where O'Shaughnessy's book was the only >thing we had to go by. > >Jim > >Original message: >>From: "Richard F. Lyon" <DickLyon@xxxxxxxx> >>Date: Mon, 7 Mar 2011 11:59:37 -0800 >>To: AUDITORY@xxxxxxxx >>Subject: Re: [AUDITORY] frequency to mel formula >> >>I've done a bit more looking for where these guys got their formulae: >> >>At 3:55 PM -0400 7/15/09, Dan Ellis wrote: >>>I think Fant is the more appropriate reference (for log(1+f/1000)) and >>>O'Shaugnessy for log(1+f/700). >> >>The "700" version appears in a couple of papers >>before O'Shaughnessy's book, and he tells me that >>got it from some place that he can't recall, but >>definitely did not make it up himself. Here are >>the two that I've found: >> >>Ananthapadmanabha, T. V. (1980) "Formant ratios >>on mel scale for male/female and male/child >>speakers", Acoustics Letters, UK, vol 4. >> >>and >> >>John Makhoul and Lynn Cosell (1976) "LPCW: An LPC >>Vocoder with Linear Predictive Spectral Warping" >>ICASSP'76 466-469. >> >>It appears that Makhoul may have made it up to >>fit (and Doug says he was at that ICASSP, so that >>may be where he got it). John Makhoul says in > >that paper: >> >>>This relation is similar to those of critical >>>band masking effects and equal intelligibility >>>curves [8]. The mel-frequency relation can be >>>approximated by the following equation >>> >>> m = 2595 log10(1 + f/700) >>> >>>where f is the frequency in Hz and m is the >>>pitch in mels. The mel scale is adjusted such >>>that m=1000 mels corresponds to f10OO Hz. >> >>In response to my inquiry, John reviewed his notes and said: >> >>>... In my notes, I have pasted a copy of Fig. 48 from The Speech >>>Chain by Peter Denes, which shows the plot of mel scale versus >>>frequency. I remember distinctly reading off the mel values from that >>>plot because Denes did not include a table of values in his book. I >>>also remember that earlier formulas divided f by 1000, after Fant. I >>>have the equation with f divided by 700 in my notebook, along with a >>>hand-drawn plot of the mel scale versus frequency (with values taken >>>from the Denes plot) and the comment: "This equation is almost a >>>perfect fit to the above curve." I frankly do not remember if I came >>>up with the 700 number or I got it from somewhere else. But, if I had >>>gotten it from somewhere else, why didn't I reference that work? >>>After all, I reference other things related to it. My guess is that I >>>must have tried a few values and found 700 to give the best visual >>>fit. >> >>(one may wonder why at Bolt, Beranek, and Newman, >>he had to read values off a plot instead of using >>Beranek's 1949 table, but that's how life was >>before Google) >> >>This makes perfect sense, since the 700 fits >>better than the 1000, even for the tabulated data >>from Berakek that Fant lists in his 1959 paper, >>which is the likely source of Denes's plot. The > >1000 Hz fits better if the domain is restricted >>to 4 kHz on the high end, but 700 Hz fits better >>overall if the full range is considered, as was >>illustrated in the plot that I sent around before: >>http://dicklyon.com/tech/Hearing/Mel-like_scales.svg >> >>The Denes plot can be seen here (later edition, presumably same plot): >>http://books.google.com/books?id=ZMTm3nlDfroC&pg=PA104 >>It only goes up to 10,000 Hz, which is 3000 mel. >>Makhoul's 700 Hz formula goes right through that >>point. >> >>As for Fant and the 1000 Hz version, he cites his >>own 1949 paper in Swedish, saying "This formula, >>discussed in more detail earlier (Fant, 1949), is >>a better approximation than the Koenig scale..." >>This line is found in his 1959 paper, which is >>what's reprinted in the usually cited 1973 book: >> >>G. Fant, "Acoustic description and classification >>of phonetic units", Ericsson Technics, No. 1, 1959 >>reprinted in G. Fant, Speech Sounds and Features, >>MIT Press, Cambridge, MA, 1973, pp. 32-83 >> >>Dan Ellis had pointed out the 1973 book's >>reference to the 1949 Swedish source, and had >>pointed out Davis & Mermelstein (1980)'s >>reference to Fant's 1959 English paper, but seems >>to have missed, as I had, the fact that the 1973 >>book chapter was a reprint of that 1959 paper. >>Steve Greenberg just pointed that out to me this >>weekend. It's easy to miss the small note at the >>bottom of the first page of the book chapter. >> >>There's also a mel formula with 625 Hz offset >>(expressed as the reciprocal, 1.6e-3 s), a good >>fit to the full 14000 Hz of Beranek's data table, >>in Lindsay and Norman, 1977: >> >> "mels = 2410 log (1.6x10^{-3} f + 1)" >> >>Human Information Processing: An Introduction to Psychology >>Peter H. Lindsay and Donald A. Norman >>Edition 2 >>Academic Press, 1977 >>http://books.google.com/books?id=6d9OAAAAMAAJ&q=%22mels+2410+log%22 >>(I haven't checked first edition) >> >>the same is also in >>Sensation and Perception >>Stanley Coren, Clare Porac, and Lawrence M. Ward >>Academic Press, 1979 >>http://books.google.com/books?id=AN9qAAAAMAAJ&q=%22mels+2410+log%22 >> >>This suggests that Makhoul's 700 Hz was not yet >>widely known and used in the late 1970s (not >>surprisingly, as it was only in an ICASSP paper, >>not likely noticed in the psychology field), and >>that others were fitting similar values, finding >>Fant's 1000 Hz unsatisfactory, perhaps. I'll Don > >Norman what he recalls. >> >>I'd still like to see that 1949 paper (or perhaps >>it's a book, at 139 pages). I've asked the >>National Library of Sweden, who have a copy, if >>they can make me a copy, but it seems unlikely. >>Anyone in Stockholm want to go take a look? >> >>In the mean time, it seems reasonable to cite >>Fant 1959 or 1949 as the source of the first mel >>formula, and Makhoul 1976 as the source of the >>modern 700 Hz version. I'll update wikipedia. >> >>Of course, this is just for the formulas. As >>Jont Allen points out, Fletcher and Munson had >>plots of all this in 1937 in JASA, but didn't >>name it like Stevens did. And as Don Greenwood >>points out, the data are all seriously flawed, >>and a better formula is one with an offset of 165 >>Hz (for a human cochlea map), or as Glasberg and >>Moore point out, 228 Hz for an ERB-rate scale. >> >>Dick


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2011/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University