Subject: Re: voiced/unvoiced detection From: "Beerends, J.G." <J.G.Beerends(at)RESEARCH.KPN.COM> Date: Wed, 11 Nov 1998 09:23:54 +0100In comment of the discussion below the following. One can calculate the probability density function for the pitch from the probability density functions of all the individual partials by creating a stochastic subharmonic representation and applying a renormalization over the subharmonic representations. The exact mathematical formulation is given in the last chapter of my PhD. A similar, more practical algorithmic, approach was given in a set of papers (JASA) by Dik Hermes. A copy of my PhD is available for those who are interested. An algorithmic description can also be found in the following Philips patent applications: 8900520 The Netherlands, 9020044007 Europe, 487462 USA, 45984/90 Japan, Philips International B.V., Eindhoven, The Netherlands. John Beerends KPN Research > ---------- > From: Pierre Divenyi[SMTP:pdivenyi(at)MARVA4.NCSC.MED.VA.GOV] > Sent: dinsdag 10 november 1998 22:37 > To: AUDITORY(at)LISTS.MCGILL.CA > Subject: Re: voiced/unvoiced detection > > At 10:11 AM 11/5/98 -0500, Keith D. Martin wrote: > > >.... I subscribe to the interpretation that it is the > >alignment of these peaks across multiple channels that generates a pitch > >sensation rather than the "sharpness" of the peaks, either in individual > >channels or in the summary. This alignment is, of course, reflected in > the > >summary autocorrelation, but summing across channels is only one of many > >ways of detecting it (this fact is pointed out in some of the papers from > >around 1990). And the width of the peak in the summary autocorrelation > >depends more on the strength of the various partials in a harmonic signal > >than it does on the "pitchiness" of the sound. So the degree of > >"pitchiness" might be related to the degree of across-channel structure > in > >the image.... > > Just for the fun of making a historical argument, I would like to point > out > that a similar idea was expressed in 1977 by Egbert de Boer ("Pitch > theories unified" in Psychophysics and Physiology of Hearing, E.F.Evans & > J.P. Wilson, eds., AP, London, pp.323-334). However, de Boer did not base > his model on autocorrelation. Rather, he obtained his pitch function > ("cardinal function") by considering pitch formation to be a stochastic > process in which various alternative (instantaneous) pitches may coexist. > The width of the pitch peak, therefore, is synonymous with variability, > i.e., the function could be regarded as a density. Of course, pitch > uncertainty, i.e., pitch density, could look very similar to > autocorrelations, summed or not. I can't help having a personal preference > for the probability density interpretation because it is broad enough to > include summary autocorrelation as well as many other models. > > The nicety of this model is that the problem of whispered or noisy speech > finds an instantaneous solution. The vocal tract may be excited by any > good > old excitation waveform, Gussian-like noise from the bronchi, an > artificial > larynx vibrator, or the vocal folds in various stages of laryngitis, > producing a continuum of standard deviation magnitudes. Naturally, the > shape of the vocal tract does not care what the excitation waveform is > and, > provided the excitation is sufficiently intense, the result will be always > the speech sound corresponding to the shape. That is, if I were able to > whisper louder than the highway noise, I could be perfectly intelligible > speaking in a car with the windows down, despite the fact that the > autocorrelation of the speech I am producing would be absolutely flat. > > To continue history, in 1978 de Boer also wrote a more detailed version of > the above cited paper, called "Analytic pitch theories" which he never > published. Interested colleagues are encouraged to write him and request a > copy. He will be very surprised... > > Pierre > > > > ************************************************************************** > ** > Pierre Divenyi Experimental Audiology Research (151) > V.A. Medical Center, Martinez, CA > 94553, USA > Phone: (925) 370-6745 > Fax: (925) 228-5738 > E-mail : pdivenyi(at)marva4.ebire.org > ************************************************************************** > ** > > McGill is running a new version of LISTSERV (1.8d on Windows NT). > Information is available on the WEB at http://www.mcgill.ca/cc/listserv > Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv