[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mfcc filters gain



Steve,

When you use the Nth root instead of the log, some channel effects may no
longer cancel out completely, at least from a theoretical perspective.  Is
there a special/different method of Cepstral normalization that you recommend ?

I am particulary interested in cancelling the pronounced difference I am seeing
in the first Cepstral component caused by significant differences in the
frequency response of various microphones.  (Some have a pronounced boost in
the higher frequencies which generatea a spectral tilt that is reflected in C1).

Thanks, Scott.





On Thu, 4 Nov 2004 10:55:40 -0000
Steve Beet <steve.beet@IEEE.ORG> wrote:

> Just to add to the confusion: as well as changing the shape of the filters,
>
> it's worth looking at their bandwidth and the non-linearity (traditionally a
>
> log operation, as specified by the theory of homomorphic filtering).
>
> The problem with triangular filters is that any small change in frequency
>
> causes a large change in MFCC values just because the peak is too sharp.
>
> Almost any shape with a flatter top will work better, provided you get the
>
> width right.
>
> The problem with logarithms is that they over-emphasise any very small
>
> signals (which are most likely background noise). Traditionally the solution
>
> is to put a lower floor on the values being logged, but a "nicer" solution
>
> to my mind, is to use an Nth root operation instead. As N increases, the Nth
>
> root gets closer and closer to a scaled and shifted log operation, while as
>
> N decreases, the effects of low levels of noise become less and less. You
>
> need to experiment with the value of N to suit the noise characteristics in
>
> your data.
>
> By improving the shape and width of the filters, and optimising N in the Nth
>
> root operation, you can get somewhere between 20 and 40% reduction in word
>
> error rate, so it's worth looking into. These figures are based on my
>
> experiments with telephone speech from the UK SpeechDat database.
>
> My own work in this area is largely unpublished, but there was at least one
>
> paper in the "Aurora" sessions of Eurospeech a few years ago which looked at
>
> these issues and came to similar conclusions. Unfortunately I too can't find
>
> any specific references at the moment.
>
> Regards,
>
> Steve Beet
>
>
>
> ________________________________________________
>
> Dr S W Beet, Principal R & D Engineer,
>
> Aculab plc, Lakeside, Bramley Road, Mount Farm,
>
> Milton Keynes, Bucks., MK1 1PT, UK
>
> Tel: (+44) 1908 273963 ; Fax: (+44) 1908 273801
>
> ________________________________________________
>
>
>
> ----- Original Message -----
>
> From: "Toth Laszlo" <tothl@INF.U-SZEGED.HU>
>
> To: <AUDITORY@LISTS.MCGILL.CA>
>
> Sent: Wednesday, November 03, 2004 6:32 PM
>
> Subject: Re: mfcc filters gain
>
>
>
> You will find quite many different scales in the literature, and sometimes
>
> even several different formulas for the same scale. I have tried a couple
>
> of them, and never found a significant difference in the recognition
>
> results. In my sceptic opinion, there are much bigger inaccuracies in
>
> current speech recognition technology, so these little differences doesn't
>
> really matter. Anyway, probably the most interesting idea in this field
>
> was when several authors tried to directly optimize the filters in order
>
> to achieve the best possible recognition....
>