Re: mfcc filters gain (Jean-Julien Aucouturier )


Subject: Re: mfcc filters gain
From:    Jean-Julien Aucouturier  <jj(at)CSL.SONY.FR>
Date:    Thu, 4 Nov 2004 11:07:50 +0100

>I am also wondering if some work has already be done to improve mfcc-like processing. As it is suggested in [1], Moore's ERB scale or >Bark scale seems to be more appropriated than the mel scale, and gammatone filterbank should be much more accurate (even if probably more >computationaly expensive) than a triangular filterbank ? One specific "variant" that I know of : In [1], //the authors propose a simple extension of the MFCC algorithm to better account for music signals. Their observation is that the MFCC computation averages (or sums, depending on whether you normalize or not) the spectrum in each sub-band, and thus reflects the average spectral characteristics. However, very different spectra can have the same average spectral characteristics. Notably, they argue that it is important to also keep track of the relative spectral distribution of peaks (related to harmonic components) and valleys (related to noise). Therefore, they extend the MFCC algorithm to not only compute the average spectrum in each band (or rather the spectral peak), but also a correlate of the variance, the Spectral Contrast (namely the amplitude between the spectral peaks and valleys in each subband). This modifies the algorithm to output 2 coefficients (instead of one) for each Mel subband. Additionally, in the algorithm published in [1], the authors replace the Mel filterBank traditionally used in MFCC analysis by an octave-scale filterbank (C_0 - C_1, C_1 - C_2, etc.), assumably more suitable for music. They also decorrelate the spectral contrast coefficients using the optimal Karhunen-Loeve transform. This algorithm was successful at improving the classification rate for a musical genre classification task, as reported in [1]. I have compared several implementations of this variant (notably using the regular Mel filterbank, or a DCT approximation to the K-L transform) to regular MFCCs on a music similarity task, and found no noticeable improvements of precision/recall (+/- 1%). I have to admit I'm a bit puzzled by the idea of considering statistical variations "inside critical bands". For what I understand, as critical bands integrate the energy in their range, the authors' approach amounts to looking at finer details than what the auditory system does. Best, JJ [1] D.-N. Jiang, L. Lu, H.-J. Zhang, J.-H. Tao, and L.-H. Cai. Music type classification by spectral contrast feature. In Proceedings of The IEEE International Conference on Multimedia and Expo, Lausanne (Switzerland), August 2002. -- Jean-Julien Aucouturier, Assistant Researcher http://www.csl.sony.fr/~jj SONY CSL Paris Tel: (33) 1 44 08 05 11 6, rue Amyot Fax: (33) 1 45 87 87 50 75005 PARIS


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University