Subject: Re: reference needed (ASR) From: Prof Roger K Moore <r.k.moore@xxxxxxxx> Date: Fri, 29 Sep 2006 14:20:06 +0100Dear Laszlo, I think this paper might contain something of what you're looking for ... http://mi.eng.cam.ac.uk/reports/svr-ftp/nock_csl00.pdf Indeed it is a well-known phenomenon that phone error rate is not necessarily a good predictor of word error rate in ASR (although it is hard to pin down definitive references). This counterintuitive behaviour can arise for a number of possible reasons, e.g. the algorithms that are used to estimate model parameters will result in different optimisations depending on whether the models are trained using phone-level annotation (which could include phone boundary information) or word-level annotation (which would not); the optimisation criteria are often based on goodness of fit rather than the outcome of classification, and hence there can be differential effects; there may be distributional differences between the set of context-dependent phone models used to form word models and the context-independent phone labels used to evaluate performance; phones are not uniformly distributed across words. I'm not sure whether more recent (discriminative) training schemes such as MCE and MPE are more or less likely to exhibit this phenomenon. Best wishes Roger ________________________________________________________________ Prof ROGER K MOORE BA(Hons) MSc PhD FIOA MIEE Chair of Spoken Language Processing Speech and Hearing Research Group (SPandH) Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK e-mail: r.k.moore@xxxxxxxx web: http://www.dcs.shef.ac.uk/~roger/ tel: +44 (0) 11422 21807 fax: +44 (0) 11422 21810 mobile: +44 (0) 7910 073631 ________________________________________________________________ > -----Original Message----- > From: AUDITORY Research in Auditory Perception > [mailto:AUDITORY@xxxxxxxx On Behalf Of Toth Laszlo > Sent: 29 September 2006 11:38 > To: AUDITORY@xxxxxxxx > Subject: [AUDITORY] reference needed (ASR) > > Dear List, > > I know that speech recognition is a bit off-topic here, but I don't know > of a more proper place to ask this. A reviewer wrote to a paper of > mine that "the fact that better phone recognition does not necessarily > mean better word recognition is already known, and people have been > talking about it very frequently. This should be made clear and perperly > referenced in the paper". Unfortunately, I'm personally sure that I've > never seen this written down, because it would have saved me a lot of > work -- but, unfortunately, I had to learned it from my own failures, > so I'm sure I won't be able to recall any references for this. I'm also > unable to figure out how to turn this thing into a reasonable Google > search term (actually, I've just managed to find a reference for just the > opposite - that "better phone recognition undoubtedly leads to better word > recognition"). So, if anyone can tell me any paper stating or showing > results that "better phone recognition does not necessarily mean better > word recognition", I would be very grateful. > Thanks, > > Laszlo Toth > Hungarian Academy of Sciences * > Research Group on Artificial Intelligence * "Failure only begins > e-mail: tothl@xxxxxxxx * when you stop trying" > http://www.inf.u-szeged.hu/~tothl *