Subject: FW: FW: mfcc filters gain From: Rahul Shrivastav <rahul(at)CSD.UFL.EDU> Date: Wed, 3 Nov 2004 19:30:33 -0500 (http://open-systems.ufl.edu/services/smtp-relay/)Guillaume, Here are some comments from my colleague, Dr. Skowronski, whose paper you cited in your posting. I hope you find these useful. Rahul -----Original Message----- From: Mark Skowronski [mailto:markskow(at)cnel.ufl.edu] Sent: Wednesday, November 03, 2004 6:31 PM To: Rahul Shrivastav Subject: Re: FW: mfcc filters gain Rahul, Feel free to post this reply to your listserv. Regarding scaling of triangular filters in MFCC or HFCC, in short, it doesn't matter. In MFCC and HFCC, the FFT magnitude squared (power spectrum) is scaled by a triangular filter, and the sum of those squared terms is called the filter output energy E(i) for i=1,...,N filters. Now scale those output energies by whatever scale factor you like (equal area triangles, equal height, unity amplitude) and denote the scaled energies as E(i)*A(i) where A(i) is the peak amplitude of the scaled triangular filter i. In MFCC and HFCC, E(i)*A(i) is log transformed to log(E(i)) + log(A(i)) before the DCT. Since the DCT is linear, log(E(i)), i=1,...,N transforms to cE(j), j=1,...,M (M-point DCT) and log(A(i)) transforms to cA(j). So log(E(i)) + log(A(i)) --> cE(j) + cA(j) via the DCT. For two different frames of speech under analysis, MFCC and HFCC will produce two different cE(j) but the same term cA(j). In computing an Lp distortion measure between the two different frames of speech (Euclidean, p=2) as in the DTW, the cA(j) terms would cancel by subtraction. In probability models of the cepstral feature distributions (Hidden Markov model or Gaussian mixture model), the means of pdfs of all classes would translate by the same amount cA(j), j=1,...,M dimension feature space. If you have A(i) changing in time (adapting to the input), that's another story... +Mark > -----Original Message----- > From: AUDITORY Research in Auditory Perception > [mailto:AUDITORY(at)LISTS.MCGILL.CA] On Behalf Of Guillaume Lemaitre > Sent: Wednesday, November 03, 2004 11:33 AM > To: AUDITORY(at)LISTS.MCGILL.CA > Subject: mfcc filters gain > > Dear list, > In the Malcom Slaney's Matlab implementation of mel frequency cepstral > coefficients, triangular filters are normalized "so that each filter has > unit > weight". Parsing some papers dealing with mfcc, I noticed that most of > authors does not mention this normalization step (a few of them do, but > without explanation). > I am wondering what does this normalization correspond to. If I am > correct, and if triangular filters were supposed to approximate critical > band filtering, they all should have the same unit height, just as third > octave, or Patterson's gammatone filterbank. Am I wrong ? > > I am also wondering if some work has already be done to improve > mfcc-like processing. As it is suggested in [1], Moore's ERB scale or > Bark scale seems to be more appropriated than the mel scale, and > gammatone filterbank should be much more accurate (even if probably more > computationaly expensive) than a triangular filterbank ? > > Regards > Guillaume > > [1] M. D. Skoweonski and J. G. Harris > "Improving the filterbank of a classic speech feature extraction algorithm" > IEEE Int. Symp. on Circuits and Systems, Bangkok, Thailand, 2003 > > ------------------------------------------------------------------- > Guillaume Lemaitre, Ph.D. > Post-doctoral fellow > Project-team REVES (REndering and Virtual Environments with Sounds) > INRIA Sophia-Antipolis tel: (+33) (0)4 92 38 50 83 > 2004 route des Lucioles fax: (+33) (0)4 92 38 50 30 > BP 93, F-06902 Sophia-Antipolis, France > Guillaume.Lemaitre(at)sophia.inria.fr, > ------------------------------------------ > >