Re: Robust method of fundamental frequency estimation. (Arturo Camacho )


Subject: Re: Robust method of fundamental frequency estimation.
From:    Arturo Camacho  <acamacho@xxxxxxxx>
Date:    Tue, 27 Feb 2007 09:27:05 -0500
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Erick, I do not see why you say I am wrong. I think the argument you use to say that is the one in which you apply the cosine transform three times. However, this scenario does not correspond to the one I described. In the scenario I described autocorrelation is applied to TIME-domain signals (i.e., the output of the filterbank), not spectrums. Let me describe my reasoning again with more detail. To facilitate the explanation let's assume we have infinite length signals and infinitely narrow filters. Applying the filterbank to the signal leave us with a decomposition of the signal into its sinusoidal components. Since there is only one sinusoid per channel, the spectrum at each channel consists of a single pulse (possibly of zero magnitude) at the central frequency of the channel. Computing autocorrelation at each channel corresponds to squaring the magnitude of the spectrum of the signal (a single pulse) and synthesizing a cosine at that frequency (by Wiener–Khinchin theorem). The summary autocorrelation just adds those cosines over channels. Since linearity of a cosine transform allows us to change the order of synthesis and addition, we can perform first the addition of the spectrums, which would leave us with the square of the original spectrum, and then perform the cosine transform, but this is just the autocorrelation of the original signal (by Wiener–Khinchin theorem). This argument can be easily extended to wider non-overlapping rectangular filters. In the case of non-rectangular gammatone ERB filters things may change a little bit, but I do not see how this change could help to improve the estimation of pitch. Arturo > Arturo Camacho <acamacho@xxxxxxxx> wrote: > > >>> autocorrelation-based pitch models that can NOT be expressed in terms >>> of the spectrum. For example, the Meddis & Hewitt or Meddis & O'Mard >>> models, or Slaney & Lyon models, >>> derived from Licklider's duplex theory, which do the ACF after what >>> the cochlea model does, which is a separation into filter channels and >>> a >> If I am >> not wrong, what Slaney & Lyon’s model does is to apply a summary >> autocorrelation to the output of a gammatone filterbank (it does some >> extra steps, but the main idea is that one). Since this can be shown to >> be equivalent to applying autocorrelation to the original signal (use >> Wiener–Khinchin theorem and linearity property of Fourier Transform), >> > > Roberto, > > > Your are wrong in your guess that to apply a summary autocorrelation to > the output of a filterbank is equivalent to applying autocorrelation to > the original signal. According to the theorem you mentioned but perhaps not > understood, autocorrelation corresponds to performing cosine transform > twice, i.e. back and forth: A first cosine transform of a signal f_0(t) > from time domain yields F_0(omega) in frequency domain. > Subsequent second cosine transform of F_0(omega) yields a f_1(tau) in time > domain again. These two steps together correspond to the autocorrelation > function ACF of the o r i g i n a l signal: f_0-->f_1(tau). Remember: > ACF corresponds to > twice cosine transform, a first one and an inverting second one. > > Bogert and Tukey called that inverted spec_trum a ceps_trum, inverting > the order of letters in the syllable spec into ceps. > > This f_1(tau) is what perhaps comes close to a major part of auditory > function even if it is hard to abandon what we learned that we are hearing > frequencies and admit that autocorrelation lag is largely equivalent to > frequency. > > ACF of the spectrum F_0(omega) would correspond not to just two but to to > three cosine transforms in series and eventually result in a function F_1 > of omega: f_0(t)-->F_0(omega]-->f_1(tau)-->F_1(omega). > > Brain cannot directly process functions of omega. In cat, there are about > 33,000 > T-multipolar chopper neurons of the ventral cochlear nucleus (VCN). T > means they immediately project to the IC via trapezoid body (TB). They > might translate place code into downsampled frequencies while preserving > tonotopy at a time. At least they show very regular responses with a > highly reproducible pattern of spike trains in which the interspike > intervals are all about the same length. Frequencies of chopper neurons > are on average about three times lower than average frequencies of firing > within single auditory nerve fibers which already tend to be considerably > lower than each belonging characteristic frequency CF for CFs in excess of > 500 Hz. > > > Regards, > Eckard Blumschein > > > -- __________________________________________________ Arturo Camacho PhD candidate Computer and Information Science and Engineering University of Florida E-mail: acamacho@xxxxxxxx Web page: www.cise.ufl.edu/~acamacho __________________________________________________


This message came from the mail archive
http://www.auditory.org/postings/2007/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University