[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reference needed (ASR)



Dear Laszlo,

I think this paper might contain something of what you're looking for ...
http://mi.eng.cam.ac.uk/reports/svr-ftp/nock_csl00.pdf

Indeed it is a well-known phenomenon that phone error rate is not
necessarily a good predictor of word error rate in ASR (although it is hard
to pin down definitive references).  This counterintuitive behaviour can
arise for a number of possible reasons, e.g. the algorithms that are used to
estimate model parameters will result in different optimisations depending
on whether the models are trained using phone-level annotation (which could
include phone boundary information) or word-level annotation (which would
not); the optimisation criteria are often based on goodness of fit rather
than the outcome of classification, and hence there can be differential
effects; there may be distributional differences between the set of
context-dependent phone models used to form word models and the
context-independent phone labels used to evaluate performance; phones are
not uniformly distributed across words.

I'm not sure whether more recent (discriminative) training schemes such as
MCE and MPE are more or less likely to exhibit this phenomenon.

Best wishes

Roger

________________________________________________________________

Prof ROGER K MOORE BA(Hons) MSc PhD FIOA MIEE

Chair of Spoken Language Processing
Speech and Hearing Research Group (SPandH)
Department of Computer Science, University of Sheffield,
Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK

e-mail: r.k.moore@xxxxxxxxxxxxxx
web:    http://www.dcs.shef.ac.uk/~roger/
tel:    +44 (0) 11422 21807
fax:    +44 (0) 11422 21810
mobile: +44 (0) 7910 073631
________________________________________________________________

> -----Original Message-----
> From: AUDITORY Research in Auditory Perception
> [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Toth Laszlo
> Sent: 29 September 2006 11:38
> To: AUDITORY@xxxxxxxxxxxxxxx
> Subject: [AUDITORY] reference needed (ASR)
> 
> Dear List,
> 
> I know that speech recognition is a bit off-topic here, but I don't know
> of a more proper place to ask this. A reviewer wrote to a paper of
> mine that "the fact that better phone recognition does not necessarily
> mean better word recognition is already known, and people have been
> talking about it very frequently. This should be made clear and perperly
> referenced in the paper". Unfortunately, I'm personally sure that I've
> never seen this written down, because it would have saved me a lot of
> work -- but, unfortunately, I had to learned it from my own failures,
> so I'm sure I won't be able to recall any references for this. I'm also
> unable to figure out how to turn this thing into a reasonable Google
> search term (actually, I've just managed to find a reference for just the
> opposite - that "better phone recognition undoubtedly leads to better word
> recognition"). So, if anyone can tell me any paper stating or showing
> results that "better phone recognition does not necessarily mean better
> word recognition", I would be very grateful.
> Thanks,
> 
>                Laszlo Toth
>         Hungarian Academy of Sciences         *
>   Research Group on Artificial Intelligence   *   "Failure only begins
>      e-mail: tothl@xxxxxxxxxxxxxxx            *    when you stop trying"
>      http://www.inf.u-szeged.hu/~tothl        *