Re: voiced/unvoiced detection ("Beerends, J.G." )

Subject: Re: voiced/unvoiced detection From: "Beerends, J.G." <J.G.Beerends(at)RESEARCH.KPN.COM> Date: Wed, 11 Nov 1998 09:23:54 +0100 In comment of the discussion below the following. One can calculate the probability density function for the pitch from the probability density functions of all the individual partials by creating a stochastic subharmonic representation and applying a renormalization over the subharmonic representations. The exact mathematical formulation is given in the last chapter of my PhD. A similar, more practical algorithmic, approach was given in a set of papers (JASA) by Dik Hermes. A copy of my PhD is available for those who are interested. An algorithmic description can also be found in the following Philips patent applications: 8900520 The Netherlands, 9020044007 Europe, 487462 USA, 45984/90 Japan, Philips International B.V., Eindhoven, The Netherlands. John Beerends KPN Research > ---------- > From: Pierre Divenyi[SMTP:pdivenyi(at)MARVA4.NCSC.MED.VA.GOV] > Sent: dinsdag 10 november 1998 22:37 > To: AUDITORY(at)LISTS.MCGILL.CA > Subject: Re: voiced/unvoiced detection > > At 10:11 AM 11/5/98 -0500, Keith D. Martin wrote: > > >.... I subscribe to the interpretation that it is the > >alignment of these peaks across multiple channels that generates a pitch > >sensation rather than the "sharpness" of the peaks, either in individual > >channels or in the summary. This alignment is, of course, reflected in > the > >summary autocorrelation, but summing across channels is only one of many > >ways of detecting it (this fact is pointed out in some of the papers from > >around 1990). And the width of the peak in the summary autocorrelation > >depends more on the strength of the various partials in a harmonic signal > >than it does on the "pitchiness" of the sound. So the degree of > >"pitchiness" might be related to the degree of across-channel structure > in > >the image.... > > Just for the fun of making a historical argument, I would like to point > out > that a similar idea was expressed in 1977 by Egbert de Boer ("Pitch > theories unified" in Psychophysics and Physiology of Hearing, E.F.Evans & > J.P. Wilson, eds., AP, London, pp.323-334). However, de Boer did not base > his model on autocorrelation. Rather, he obtained his pitch function > ("cardinal function") by considering pitch formation to be a stochastic > process in which various alternative (instantaneous) pitches may coexist. > The width of the pitch peak, therefore, is synonymous with variability, > i.e., the function could be regarded as a density. Of course, pitch > uncertainty, i.e., pitch density, could look very similar to > autocorrelations, summed or not. I can't help having a personal preference > for the probability density interpretation because it is broad enough to > include summary autocorrelation as well as many other models. > > The nicety of this model is that the problem of whispered or noisy speech > finds an instantaneous solution. The vocal tract may be excited by any > good > old excitation waveform, Gussian-like noise from the bronchi, an > artificial > larynx vibrator, or the vocal folds in various stages of laryngitis, > producing a continuum of standard deviation magnitudes. Naturally, the > shape of the vocal tract does not care what the excitation waveform is > and, > provided the excitation is sufficiently intense, the result will be always > the speech sound corresponding to the shape. That is, if I were able to > whisper louder than the highway noise, I could be perfectly intelligible > speaking in a car with the windows down, despite the fact that the > autocorrelation of the speech I am producing would be absolutely flat. > > To continue history, in 1978 de Boer also wrote a more detailed version of > the above cited paper, called "Analytic pitch theories" which he never > published. Interested colleagues are encouraged to write him and request a > copy. He will be very surprised... > > Pierre > > > > ************************************************************************** > ** > Pierre Divenyi Experimental Audiology Research (151) > V.A. Medical Center, Martinez, CA > 94553, USA > Phone: (925) 370-6745 > Fax: (925) 228-5738 > E-mail : pdivenyi(at)marva4.ebire.org > ************************************************************************** > ** > > McGill is running a new version of LISTSERV (1.8d on Windows NT). > Information is available on the WEB at http://www.mcgill.ca/cc/listserv > Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv

This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University