[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MFCC flaws

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: MFCC flaws
From: Roy Patterson <rdp1@xxxxxxxxx>
Date: Mon, 12 Jan 2009 10:19:01 +0000
Approved-by: rdp1@xxxxxxxxx
Delivery-date: Mon Jan 12 05:32:06 2009
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
Reply-to: Roy Patterson <rdp1@xxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

Hello all,

MFCC's have, what seems to me, a fundamental flaw, quiteseparate from anything to do with the Mel scale.The problem was pointed out to me by Alain de Cheveigne. TheCepstral Coefficients are COSINE coefficients which meansthat they cannot shift with speaker size to capture theshift in formant frequencies that occurs as children grow upand their vocal tracts get longer. This is a big effectwhich has been repeatedly observed. See Lee et al. (1999) orVoperian et al. (2007) for examples.

We recently presented a paper explaining why recognizerstrained on the data of a man cannot possibly be expected torecognize the speech of a woman, let alone a child. The CCsfor a given phoneme are different for men, women andchildren. This is one of the reasons that training sets haveto be so large and training has to take so long.

The paper was presented at Acoustics08 in Paris. The paperalso describes an alternative, set of auditory features thatare largely scale invariant. The reference is

Monaghan, J. J. M., Feldbauer, C., Walters, T. C. andPatterson, R. D. (2008) “Low-dimensional, auditory featurevectors that improve vocal-tract-length normalization inautomatic speech recognition,” Acoustics08, Paris, paperH000688.

I can provide a pdf of the paper on request, and I would beinterested in your comments on the ideas in the paper.


Regards Roy P

--
* ** *** * ** *** * ** *** * ** *** * ** *** *
Roy D. Patterson
Centre for the Neural Basis of Hearing
Department of Physiology, Development and Neuroscience
University of Cambridge
Downing Street, Cambridge, CB2 3EG

http://www.pdn.cam.ac.uk/cnbh/
phone: +44 (1223) 333819 office
fax:   +44 (1223) 333840 department
email	rdp1@xxxxxxxxx

Prev by Date: PhD student positions in Leipzig, Germany
Next by Date: Re: Ecological Acoustics Women
Previous by thread: PhD student positions in Leipzig, Germany
Next by thread: More cepstrum flaws
Index(es):
- Date
- Thread