[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MFCC flaws
Hello all,
MFCC's have, what seems to me, a fundamental flaw, quite
separate from anything to do with the Mel scale.
The problem was pointed out to me by Alain de Cheveigne. The
Cepstral Coefficients are COSINE coefficients which means
that they cannot shift with speaker size to capture the
shift in formant frequencies that occurs as children grow up
and their vocal tracts get longer. This is a big effect
which has been repeatedly observed. See Lee et al. (1999) or
Voperian et al. (2007) for examples.
We recently presented a paper explaining why recognizers
trained on the data of a man cannot possibly be expected to
recognize the speech of a woman, let alone a child. The CCs
for a given phoneme are different for men, women and
children. This is one of the reasons that training sets have
to be so large and training has to take so long.
The paper was presented at Acoustics08 in Paris. The paper
also describes an alternative, set of auditory features that
are largely scale invariant. The reference is
Monaghan, J. J. M., Feldbauer, C., Walters, T. C. and
Patterson, R. D. (2008) “Low-dimensional, auditory feature
vectors that improve vocal-tract-length normalization in
automatic speech recognition,” Acoustics08, Paris, paper
H000688.
I can provide a pdf of the paper on request, and I would be
interested in your comments on the ideas in the paper.
Regards Roy P
--
* ** *** * ** *** * ** *** * ** *** * ** *** *
Roy D. Patterson
Centre for the Neural Basis of Hearing
Department of Physiology, Development and Neuroscience
University of Cambridge
Downing Street, Cambridge, CB2 3EG
http://www.pdn.cam.ac.uk/cnbh/
phone: +44 (1223) 333819 office
fax: +44 (1223) 333840 department
email rdp1@xxxxxxxxx