Re: voiced/unvoiced detection (Pierre Divenyi )

Subject: Re: voiced/unvoiced detection From: Pierre Divenyi <pdivenyi(at)MARVA4.NCSC.MED.VA.GOV> Date: Tue, 10 Nov 1998 13:37:25 -0800 At 10:11 AM 11/5/98 -0500, Keith D. Martin wrote: >.... I subscribe to the interpretation that it is the >alignment of these peaks across multiple channels that generates a pitch >sensation rather than the "sharpness" of the peaks, either in individual >channels or in the summary. This alignment is, of course, reflected in the >summary autocorrelation, but summing across channels is only one of many >ways of detecting it (this fact is pointed out in some of the papers from >around 1990). And the width of the peak in the summary autocorrelation >depends more on the strength of the various partials in a harmonic signal >than it does on the "pitchiness" of the sound. So the degree of >"pitchiness" might be related to the degree of across-channel structure in >the image.... Just for the fun of making a historical argument, I would like to point out that a similar idea was expressed in 1977 by Egbert de Boer ("Pitch theories unified" in Psychophysics and Physiology of Hearing, E.F.Evans & J.P. Wilson, eds., AP, London, pp.323-334). However, de Boer did not base his model on autocorrelation. Rather, he obtained his pitch function ("cardinal function") by considering pitch formation to be a stochastic process in which various alternative (instantaneous) pitches may coexist. The width of the pitch peak, therefore, is synonymous with variability, i.e., the function could be regarded as a density. Of course, pitch uncertainty, i.e., pitch density, could look very similar to autocorrelations, summed or not. I can't help having a personal preference for the probability density interpretation because it is broad enough to include summary autocorrelation as well as many other models. The nicety of this model is that the problem of whispered or noisy speech finds an instantaneous solution. The vocal tract may be excited by any good old excitation waveform, Gussian-like noise from the bronchi, an artificial larynx vibrator, or the vocal folds in various stages of laryngitis, producing a continuum of standard deviation magnitudes. Naturally, the shape of the vocal tract does not care what the excitation waveform is and, provided the excitation is sufficiently intense, the result will be always the speech sound corresponding to the shape. That is, if I were able to whisper louder than the highway noise, I could be perfectly intelligible speaking in a car with the windows down, despite the fact that the autocorrelation of the speech I am producing would be absolutely flat. To continue history, in 1978 de Boer also wrote a more detailed version of the above cited paper, called "Analytic pitch theories" which he never published. Interested colleagues are encouraged to write him and request a copy. He will be very surprised... Pierre **************************************************************************** Pierre Divenyi Experimental Audiology Research (151) V.A. Medical Center, Martinez, CA 94553, USA Phone: (925) 370-6745 Fax: (925) 228-5738 E-mail : pdivenyi(at)marva4.ebire.org **************************************************************************** McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at http://www.mcgill.ca/cc/listserv

This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University