Neil P. McAngus Todd
Dept. of Psychol., Univ. of Manchester, Manchester M13 9PL, U.K.
Guy Brown
Univ. of Sheffield, Sheffield S10 2TN, U.K.
Recently a computational model of prosody perception based on a multi-time-scale decomposition of the output from a cochlear model has been demonstrated [Todd and Brown, ``A multi-scale auditory model of prosodic perception,'' Proceedings of the International Conference on Spoken Language Processing (1994)]. This model determines the temporal grouping and prominence of syllables from a speech signal. In this paper we present evidence to show that the model is able to carry out a complete segmentation of a speech signal, from the level of individual phonemes and phoneme clusters up to the phrase and utterance level. Implications for speech recognition are discussed.