Abstract:
In manual cued speech (MCS) a speaker gestures with his/her hand to resolve ambiguities among speech elements that are often confused by speechreaders. The shape of the hand distinguishes among consonants, and the position of the hand relative to the face distinguishes among vowels. Experienced receivers of MCS achieve nearly perfect reception of everyday connected speech. To understand the benefits that might be derived from the imperfect cues produced by an automatic cueing system, videotaped sentences with handshapes corresponding to the phones identified by simulated phonetic speech recognizers were dubbed. The cues dubbed on these sentences were discrete in both shape and position rather than fluidly articulated, and the speaking rate was roughly 50% faster than for MCS. When the phones identified by an ideal recognizer were used to produce the cues, performance was only slightly lower than for MCS. When cues were derived from an existing recognizer, intelligibility was reduced, but substantial benefits to speechreading were observed. Current research is aimed at developing an automatic speech recognition system with the speed, accuracy, and computational efficiency required for a real-time automatic cueing system. [Work supported by NIH.]