[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Frequency to Mel Formula
Don,
Thanks again for your great explanations of this complicated stuff.
All that notwithstanding, I'm still poking around at why we have
these two different mel scales, with breaks at 700 and 1000. So I
got hold of Fant's book, which has Baranek's data table in it, and
plotted up some comparisons.
See http://dicklyon.com/tech/Hearing/Mel-like_scales.svg
The "Mel 1000" curve comes pretty close to the Baranek table data up
through about 4 kHz, then diverges far from it above that. The "Mel
700" curve misses pretty badly around 2-6 kHz, but fits better on
average if you count the highest frequencies.
The "Umesh" curve, f / (0.741 + 0.00024*f), doesn't fit particularly
well, but has a good shape, so I did a "fit" and got f / (0.759 +
0.000252*f).
I also did a mel-type fit, and found a broad optimum for the corner
around 711.5 Hz (under the constraint that 1000 Hz maps to 1000,
which I should probably have tried relaxing, but didn't).
Anyway, here's my theory: Fant fitted to the frequency range he
cared about, which probably only went to 4 kHz or so. And then
someone else probably did a fit to the same Baranek table over the
whole range, and got the 700 number (the plot shows that the 711.5
point are pretty much right on the 700 curve). And that's why we see
Baranek referenced so much, maybe?
I also looked at goodness of fit (sum squared error in mel space)
including all the frequencies in the Fant/Baranek table. It turns
out that the Umesh type fit has only 1/8 as much error as the
mel-like fit, due to the Bark-like curvature at the high-frequency
end.
So for people who like Baranek's table (assuming Fant has a true copy
of it), the Umesh type function should be a win. But I don't think
that function extends well to the larger log-like range that we find
in the ERB and Greenwood type curves, which are the ones that make
more sense in auditory-based applications.
That's my theory and I'm sticking to it.
Dick