Subject: Re: FW: FW: mfcc filters gain From: "Richard F. Lyon" <DickLyon(at)ACM.ORG> Date: Thu, 4 Nov 2004 07:54:33 -0800At 9:46 AM -0500 11/4/04, J. Scott Merritt wrote: >Rahul, > >Thank you for that posting. It seems related to a recent discussion I had >with a colleague regarding the need to eliminate the natural spectral tilt >of human speech before taking the DCT. By Dr. Skowronski's reasoning, it >appears clear that spectral tilt compensation is not required before taking >the DCT. > >Best regards, Scott. > It seems to me that anyone who wants to use the mfcc technique, or other cepstral or homomorphic technique, really should start by understanding the math and the sensitivities enough to know what they're getting into. Dr. Skowronski's observation that in a cepstrum any set of channel gains in the analyzer is just an offset is cepstral space is elementary and well known. However, when you move to make the whole process more robust, e.g. by incorporating Steve Beet's Nth root instead of log (which I totally endorse), or any other kind of stabilization of the log function's singularity at zero, this property of the log no longer saves you, and it becomes important to get channel weightings that are at least in the right ballpark (they get Nth-rooted, too, so they're not critical, but they don't totally drop out as they did with the log). Time-adaptive channel gains can also work very well, and help by removing some of the cepstral space offsets due to the channel (rooom, mic, speaker, etc.). If you allow one tilt parameter to optimize over, you'll probably benefit from it. The triangular filter, as was mentioned by someone already, is probably a poor choice compared to any more auditory-motivated filter such as gammatone or all-pole gammatone. As Slaney and I have shown, these can be very efficiently implemented. Dick