Re: phonemes (Neil Todd )


Subject: Re: phonemes
From:    Neil Todd  <TODD(at)FS4.PSY.MAN.AC.UK>
Date:    Wed, 1 Apr 1998 19:48:32 GMT

Keith R. Kluender wrote: > >Neil Todd's thoughtful contribution was informative. I only add a brief >caveat to this discussion regarding the assumption of phonetic stages. >It is sometimes useful to consider how speech perception would work if >phonemes did not exist as an intervening stage between acoustic input >and lexicon. Speech perception researchers, including myself, have >perseverated on perception of phonemes as if they are the real entities >to be explained. Much of this heritage arises from the utility of >phonemes to efficiently describe distinctions between morphemes. As >such, phonemes are an invention by linguists to describe language at >a given level of detail. Phonemes may or may not exist as a separable >level of analysis in the process of speech perception. > >It sometimes is useful to imagine a lexicon that is primary encoded in >auditory dimensions. What is true is that, if one wishes to economically >describe the variance in this lexical space (e.g., principle components), >much or most of the variance in the space could be described in terms >of dimensions that map fairly well on to phonetic distinctions. However, >this simply recapitulates the linguists' descriptive claims. It does >not necessarily afford phonemes any process role. Instead, phonemes may >be an emergent property of a sufficiently well-populated lexical space. >I have not scrutinized the literature Todd shares, so I'd like to maintain >some caution before claiming that those findings can be reinterpretted >without recourse to a phonetic stage of processing. I would be hopeful, >however, that this is the case. In summarising the neurological literature as I have discerned it, I did not wish to give the impression that I subscribe to the phoneme construct or that speech perception proceeds in three detached information processing black box stages. The notion of a phonological encoding process is a convenient fiction to describe what is more likely a hierarchical processes, involving multiple levels of analysis, phonetic features, sub-syallabic and syllabic features, etc., as distinct from a primitive acoustic level. At risk of taxing the patience of list members here is another edited extract from chapter. ***** In the last few years there has been a realisation that some new directions are required in the field of speech perception if further progress is to be made (Nygaard and Pisoni, 1995; Greenberg, 1996). This realisation has come about due to the persistence of a number of issues. The first issue is that of segmentation. Close examination of a speech signal shows considerable overlap of phonetic units, reflecting the continuous nature of vocal tract activity in speech. In an attempt to bridge the gap between these empirical observations and traditional linear phonologies, attempts have been made to develop a more non-linear phonology which takes into account the gestural nature of speech production (Fowler, 1996). Such developments, however, offer few insights into the process by which phonetic information is recovered. The second issue is that of invariance. Even if it is possible to segment the signal, there is no invariant set of acoustic features or properties that correspond uniquely to particular phonemes. This variability has a number of sources. One source is coarticulation, which, like phonetic overlap, is due to the continuous nature of vocal tract activity. The second is speaker variability. Even when one allows for the often extreme differences of phonetic realisation due to dialectal variation, considerable differences still remain, due to factors such as variation in the size and shape of the individual vocal tract, age, gender, and speaking rate. There are two main reasons why the issues remain unresolved. The first is the widespread presupposition that the proper unit of analysis is the phoneme or phonetic feature. The evidence would appear to suggest, however, that the perceptual system makes use of all levels of analysis, with no one level more important than another. The second, more fundamental reason is the underlying assumption of most theories and models of speech perception that there exists a lexicon of abstract, canonical prototypes in LTM, which are compared with the incoming signal. The assumption has two important consequences: (1) Sources of variation are regarded as "noise", which must be normalised away before recognition can take place. (2) Prosody is regarded as peripheral to the process of speech recognition, providing at best indirect cues to the recovery of morphosyntactic information. The result is that potentially rich sources of linguistic (and paralinguistic) information are disregarded. Speaker variability, for example, not only provides information important for speaker identification but also for correct pragmatics (Nygaard and Pisoni, 1995), and there is evidence that listeners retain much relevant detail in LTM. More serious, however, is the disregard of prosody, since there is ample evidence that it plays an absolutely central role in spoken language. According to Nygaard and Pisoni (1995) "it is apparent that prosody provides a crucial connection between segments, features, words and higher-level grammatical processes. In addition, prosody provides useful information regarding lexical identity, syntactic structure, and the semantic content of a talker's utterance" [p.74]. Indeed, it would not be unreasonable to suggest that it is prosody that actually holds spoken language together. Without rhythm or melody, speech would be incomprehensible or meaningless sound. Greenberg, S. (1996) Understanding speech understanding: Towards a unified theory of speech perception. In W. Ainsworth and S. Greenberg (Eds). Proceedings of the International Workshop on The auditory basis of speech perception. Keele, July, 1996.pp 1-7. Nygaard, L. and Pisoni, D. (1995) Speech perception: New directions in research and theory. In J. Miller and P. Eimas (Eds.) Speech, language and communication. Handbook of Perception and Cognition 2nd Edition. Academic: San Diego. pp 63-96. Fowler, C. (1996) Speaking. In Handbook of Perception and Action, Volume 2. Academic Press: San Diego. pp 503-560. ******* I think you will agree that Keith that my own position is actually very close to your own.


This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University