Re: frequency to mel formula ("Richard F. Lyon" )


Subject: Re: frequency to mel formula
From:    "Richard F. Lyon"  <DickLyon@xxxxxxxx>
Date:    Wed, 9 Mar 2011 08:03:45 -0800
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Can I get a volunteer in Stockholm to help with this mel formula quest? Cecilia at the National Library of Sweden ("Info-kb-se" <info@xxxxxxxx>) has found the 1949 Fant report, and says "I have now glanced through the book twice without finding the formula you are looking for." If someone who knows what they're looking for and reads Swedish has the time to look better, it would really be interesting to know what Fant referred to when he said "This formula, discussed in more detail earlier (Fant, 1949), is a better approximation than the Koenig scale..." Dick At 11:59 AM -0800 3/7/11, Richard F. Lyon wrote: >I've done a bit more looking for where these guys got their formulae: > >At 3:55 PM -0400 7/15/09, Dan Ellis wrote: >>I think Fant is the more appropriate reference (for log(1+f/1000)) and >>O'Shaugnessy for log(1+f/700). > >The "700" version appears in a couple of papers >before O'Shaughnessy's book, and he tells me >that got it from some place that he can't >recall, but definitely did not make it up >himself. Here are the two that I've found: > >Ananthapadmanabha, T. V. (1980) "Formant ratios >on mel scale for male/female and male/child >speakers", Acoustics Letters, UK, vol 4. > >and > >John Makhoul and Lynn Cosell (1976) "LPCW: An >LPC Vocoder with Linear Predictive Spectral >Warping" ICASSP'76 466-469. > >It appears that Makhoul may have made it up to >fit (and Doug says he was at that ICASSP, so >that may be where he got it). John Makhoul says >in that paper: > >>This relation is similar to those of critical >>band masking effects and equal intelligibility >>curves [8]. The mel‹frequency relation can be >>approximated by the following equation >> >> m = 2595 log10(1 + f/700) >> >>where f is the frequency in Hz and m is the >>pitch in mels. The mel scale is adjusted such >>that m=1000 mels corresponds to f10OO Hz. > >In response to my inquiry, John reviewed his notes and said: > >>... In my notes, I have pasted a copy of Fig. 48 from The Speech >>Chain by Peter Denes, which shows the plot of mel scale versus >>frequency. I remember distinctly reading off the mel values from that >>plot because Denes did not include a table of values in his book. I >>also remember that earlier formulas divided f by 1000, after Fant. I >>have the equation with f divided by 700 in my notebook, along with a >>hand-drawn plot of the mel scale versus frequency (with values taken >>from the Denes plot) and the comment: "This equation is almost a >>perfect fit to the above curve." I frankly do not remember if I came >>up with the 700 number or I got it from somewhere else. But, if I had >>gotten it from somewhere else, why didn't I reference that work? >>After all, I reference other things related to it. My guess is that I >>must have tried a few values and found 700 to give the best visual >>fit. > >(one may wonder why at Bolt, Beranek, and >Newman, he had to read values off a plot instead >of using Beranek's 1949 table, but that's how >life was before Google) > >This makes perfect sense, since the 700 fits >better than the 1000, even for the tabulated >data from Berakek that Fant lists in his 1959 >paper, which is the likely source of Denes's >plot. The 1000 Hz fits better if the domain is >restricted to 4 kHz on the high end, but 700 Hz >fits better overall if the full range is >considered, as was illustrated in the plot that >I sent around before: >http://dicklyon.com/tech/Hearing/Mel-like_scales.svg > >The Denes plot can be seen here (later edition, presumably same plot): >http://books.google.com/books?id=ZMTm3nlDfroC&pg=PA104 >It only goes up to 10,000 Hz, which is 3000 mel. >Makhoul's 700 Hz formula goes right through that >point. > > >As for Fant and the 1000 Hz version, he cites >his own 1949 paper in Swedish, saying "This >formula, discussed in more detail earlier (Fant, >1949), is a better approximation than the Koenig >scale..." >This line is found in his 1959 paper, which is >what's reprinted in the usually cited 1973 book: > >G. Fant, "Acoustic description and >classification of phonetic units", Ericsson >Technics, No. 1, 1959 >reprinted in G. Fant, Speech Sounds and >Features, MIT Press, Cambridge, MA, 1973, pp. >32-83 > >Dan Ellis had pointed out the 1973 book's >reference to the 1949 Swedish source, and had >pointed out Davis & Mermelstein (1980)'s >reference to Fant's 1959 English paper, but >seems to have missed, as I had, the fact that >the 1973 book chapter was a reprint of that 1959 >paper. Steve Greenberg just pointed that out to >me this weekend. It's easy to miss the small >note at the bottom of the first page of the book >chapter. > > >There's also a mel formula with 625 Hz offset >(expressed as the reciprocal, 1.6e-3 s), a good >fit to the full 14000 Hz of Beranek's data >table, in Lindsay and Norman, 1977: > > "mels = 2410 log (1.6x10^{-3} f + 1)" > >Human Information Processing: An Introduction to Psychology >Peter H. Lindsay and Donald A. Norman >Edition 2 >Academic Press, 1977 >http://books.google.com/books?id=6d9OAAAAMAAJ&q=%22mels+2410+log%22 >(I haven't checked first edition) > >the same is also in >Sensation and Perception >Stanley Coren, Clare Porac, and Lawrence M. Ward >Academic Press, 1979 >http://books.google.com/books?id=AN9qAAAAMAAJ&q=%22mels+2410+log%22 > >This suggests that Makhoul's 700 Hz was not yet >widely known and used in the late 1970s (not >surprisingly, as it was only in an ICASSP paper, >not likely noticed in the psychology field), and >that others were fitting similar values, finding >Fant's 1000 Hz unsatisfactory, perhaps. I'll >Don Norman what he recalls. > > >I'd still like to see that 1949 paper (or >perhaps it's a book, at 139 pages). I've asked >the National Library of Sweden, who have a copy, >if they can make me a copy, but it seems >unlikely. Anyone in Stockholm want to go take a >look? > >In the mean time, it seems reasonable to cite >Fant 1959 or 1949 as the source of the first mel >formula, and Makhoul 1976 as the source of the >modern 700 Hz version. I'll update wikipedia. > >Of course, this is just for the formulas. As >Jont Allen points out, Fletcher and Munson had >plots of all this in 1937 in JASA, but didn't >name it like Stevens did. And as Don Greenwood >points out, the data are all seriously flawed, >and a better formula is one with an offset of >165 Hz (for a human cochlea map), or as Glasberg >and Moore point out, 228 Hz for an ERB-rate >scale. > >Dick


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2011/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University