Dear Peter,
You are completely correct in pointing out my faulty memory. You descibed
dominance regions for pitch but not unresolved harmonics in the second of your
1996 papers. I was thinking of a talk Bertrand gave out here in Berkeley late
last year. The conclusion that I claimed he stated was correct, though it was
not in the context of autocorrelation models. The move toward using Shamma's
ideas is also correct and was stated in the Berkeley talk and a talk given near
Paris last year.
The main point, of course, was to request a longer article on the
state-space embedding technique described by Dmitry. It is an interesting idea
that needs more detail made available. Especially the idea of how it could be
implemented in a neural system.
Thanks also for your detailed analysis of autocorrelation results in your
mailing.
-Craig
--------------------- Craig Atencio Department of Bioengineering
UCSF/UCB W.M. Keck Center for Integrative Neuroscience UCSF 513 Parnassus
Ave. HSE 834, Box 0732 San Francisco, CA, 94143-0732, USA http://www.keck.ucsf.edu/~craigoffice:
415-476-1762 (UCSF) cell: 510-708-6346
----- Original Message -----
Sent: Monday, January 19, 2004 9:37
PM
Subject: Re: Is correlation any good for
pitch perception?
Dear Craig and Eckhard
Craig, which of our 1996 results
are you thinking of "that overestimate pitch salience of unresolved
harmonics"? I can't think of any offhand -- do you have some of our
results confused with those generated by computer models?
P. A. Cariani and B. Delgutte,
"Neural correlates of the pitch of complex tones. I. Pitch and pitch salience.
II. Pitch shift, pitch ambiguity, phase-invariance, pitch circularity, and the
dominance region for pitch.," J. Neurophysiology, vol. 76, pp.
1698-1734, 1996./fontfamily>
A difficulty with estimating pitch
saliences has been the relative dearth of data on salience per se (Fastl's
papers notwithstanding). We stated that the peak-to-mean ratio in
population-wide all-order interval distribution qualitatively corresponded
to pitch salience. The more general concept is that the salience of the
pitch is related to the fraction of intervals related to a particular
periodicity (n/F0, where n= 1,2,3,....) amongst all others.
We found
that the salience of high-Fc AM tones just outside the pitch
existence region was near 1 and the salience of AM noise was near 1.4. The
ordering of the saliences did correspond well to the ordering in Fastl's
paper and in the rest of the literature that was then available (pitches of
harmonics in the dominance region have higher saliences than higher ones,
pitches of resolved harmonics 3-5 had higher saliences than unresolved ones
6-12).
Population-interval models (I speak of the results of my own
simulations) predict lower saliences as carrier frequencies
increase (because of weaker phase locking to the carrier, a smaller
fraction of ANFs being driven due to the asymmetry of tuning curves, and
the more dispersed nature of intervals associated with envelopes rather
than individual partials). They also predict lower saliences with harmonic
number. If you have a logarithmic distribution of CFs, the higher the
harmonic number, the fewer the fibers that are excited (proportionally) by
a single harmonic bracketed by others.
If you're talking about low F0's
(the salience of F0=80 Hz harmonic complexes in our paper), subjectively,
the 80 Hz pitch of those stimuli is strong (these had components at the
fundamental, in contrast to Carlyon's stimuli). Certainly if you go over
to the piano and play a melody in the register near 80 Hz, the pitches aren't
qualitatively weaker than an octave or two above it. This is a subjective
observation.
It is true, though, that the early models would not have
accounted well for saliences of very low pitches (< 50-60 Hz), because
the early models did not discount longer intervals. One of subsequent
evolutions of pitch models in the last 10 years has been the
realization that lower limit of pitch has implications for interval-based
models, e.g.
D. Pressnitzer, R. D. Patterson, and K. Krumboltz, "The
lower limit of melodic pitch," J. Acoust. Soc. Am., vol. 109, pp. 2074-2084,
2001.
This was driven, I think, in part from Carlyon & Shackleton's
(1994) work on low-F0 periodicities in high-Fc channels (high harmonic
numbers), which has led some of us to hypothesize that high CF channels may
have shorter interval analysis windows than lower-CF ones. It may well be
the case that there are some differences in processing of intervals according
to CF, but this does not change the core hypothesis that there is a global
temporal representation of periodicities below around 4 kHz. (I believe
that this strong, precise level-invariant representation coexists with a much
weaker and coarser (rate) place representation that covers the range of
cochlear resonances (50-20,000 Hz; i.e. a duplex model much like
Licklider's).
You have to realize that the earlier studies were simply
trying to predict pitch on the basis of interval patterns. We didn't deal with
questions around the fringes of the pitch existence region. To deal with these
questions, one needs to grapple with the length of the interval analysis
windows (what is the longest interval that is analyzed?). An autocorrelation
that encompasses indefinitely long lags as indefinitely precise frequency
resolution (like a vernier principle) -- we know the frequency resolution of
the auditory system is quite fine, but nevertheless limited. Goldstein and
Srulowicz assumed first order intervals, which in effect produces an
exponential window, but there are many problems with first-order intervals
(they are rate-dependent -- pitch representations would shift as SPLs and
firing rates increased; the other major problem comes from interference when
one has 2 harmonic complexes (say n= 1-6) of different F0's -- if they are 20%
apart in frequency, their pitches do not obliterate each other in the manner
that would be expected if the representations were based on first-order
intervals). "Higher-order peaks" are necessary to account for hearing multiple
concurrent F0's. This interference is a big problem for first-order intervals
and for central representations of pitch that rely on bandpass
MTF's.
Recently Krumbholz, Patterson, Nobbe, & Fastl have recently
probed the form of the (putative) interval analysis windows:
[2] K.
Krumbholz, R. D. Patterson, A. Nobbe, and H. Fastl, "Microsecond temporal
resolution in monaural hearing without spectral cues?," J Acoust Soc Am, vol.
113, pp. 2790-800, 2003.
These refinements of interval models appear to
be capable of handling decline of salience of resolved and unresolved
harmonics at both ends of the spectrum. For low frequencies, salience is
limited by the length/shape of the interval analysis window; for high
frequencies, it is limited by the factors outlined
above.
AUTOCORRELATION: REPRESENTATIONS &
COMPUTATIONS =======================================================================
Are you really sure that our auditory system uses
autocorrelation at all? (Terez)
"There are indeed at best
scant indications for autocorrelation merely inside brain."
(Eckhard)
We have to be clear about neural representations and analyses
("computations").
It is abundantly clear that the all-order interspike
interval distributions have forms that resemble stimulus autocorrelation
functions in essential ways, up to the frequency limits of phase-locking. A
temporal, interval-based representation of periodicity and spectrum exists at
early stages of auditory processing, at least up to the level of the
midbrain. There are tens if not hundreds of papers in the auditory
literature that support this.
The mechanism by which this information
is utilized/analyzed by the auditory system is unknown, and I agree that
the evidence for neural autocorrelators per se (a la Licklider) is quite
scant. However, the evidence for central rate-place representation of pitch
F0-pitch and low frequency pure tone pitches as well) is also very weak --
I see no convincing physiological evidence for harmonic templates (BF's
<< 5 kHz) or of robust physiological resolution of harmonics higher
than the second. We want to see a representation of lower-frequency sounds
(< 4kHz) that is precise, largely level-invariant, capable of supporting
multiple objects (two musical instruments playing different notes), and
that accounts for the pitches of pure and complex tones.
A central
rate-place coding of pitch is not out of the question -- I can imagine ways
that we might have missed some kind of covert, distributed representation
-- but I can also imagine similar (but more elegant) possibilities for
central temporal representations, and there is no compelling reason to
think that the central codes must be rate-based. Cortical neurons don't
behave anything like rate integrators -- its evident from the nature of
single unit responses that some other functional information-processing
principle must be operant; we desperately need fresh ideas -- alternative
notions of information processing.
It's premature to rule anything out
yet. We understand the cortical correlates of very few auditory percepts
with any degree of confidence, such that we can say that it is this or that
kind of code. Even in vision, where there are orders of magnitude more
workers on both cortical and theoretical fronts, as far as I can see this
is also the case -- they do not have a coherent theory of how the cortex
works -- how the information is represented and analyzed -- they cannot
explain how we (and most animals) distinguish triangles from circles or why
images disappear when they are stabilized on the retina.
I think the
central problem in auditory neurophysiology is to determine what becomes of
this superabundant, high quality, invariant, interval information. Laurel
Carney and Shihab Shamma have proposed models for utilizing phase
information, but it remains to be seen whether there is compelling
physiological evidence for these models. It's also not yet clear to me how
these models behave in terms of pitch. I have been working on time-domain
strategies that bypass the need to compute autocorrelations explicitly
(pitch is notoriously relative, which is not consistent with explicit
absolute estimation mechanisms), but I do not yet see any strong positive
physiological evidence for these either.
Amidst all this equivocation
about mechanisms, let us not forget what we know about early representations,
i.e. that the interval-based "autocorrelation-like" representation does exist
at the levels of the auditory nerve, cochlear nuclei, and midbrain. Even at
the midbrain, I think it is likely that the interval-based representation
is of higher quality and greater reliability than those based on bandpass
MTFs or on rate-place profiles.
I think it's likely that, whatever the
mechanism turns out to be, it will involve this interval information and it
will also be tied in closely with scene analysis mechanisms (harmonicity
grouping).
If one is fairly certain that the information is
temporally-coded to begin with, then one looks first for temporal processing
mechanisms.
--Peter Cariani
Peter Cariani, PhD Eaton Peabody
Laboratory of Auditory Physiology Massachusetts Eye & Ear
Infirmary 243 Charles St., Boston, MA 02114 USA
On Friday,
January 16, 2004, at 05:34 PM, Craig Atencio wrote:
Dear Dmitry,
My understanding recently was that
autocorrelation may not be the best measure of periodicity pitch because
it performs too well. Papers in 1996 by Cariani and Delgutte showed that
pitch salience was overestimated for unresolved harmonics. For most
everything else the model worked quite well. Note that these studies were
a neurophysiological test of Licklider's original autocorrelation
idea, where he gave a basic schematic of how autocorrelation might
be implemented in a neural system. I recently heard a talk where
Delgutte said that he was moving more in the direction of Shamma's
earlier work based on spatio-temporal representations of auditory
signals.
I did read your earlier ICASSP paper and took a look at the
Matlab files on the website. Unfortunately, the code is not available
for viewing and the ICASSP paper lacks some detail. I, for one, was
really intrigued by your idea, so I kindly suggest that you write a
longer paper and submit it to JASA. The editors there are always
interested in pitch, as are the reviewers and readers. (I believe that
last year Pierre Divenyi proposed this as well.)
You also
mentioned that a basic analog system could implement your idea. Including
that in the JASA paper would be great too. It would be helpful to see
that so we could determine which neural center, if any, might be able to
implement your idea.
Best
wishes,
Craig
--------------------- Craig
Atencio Department of Bioengineering UCSF/UCB W.M. Keck Center for
Integrative Neuroscience UCSF 513 Parnassus Ave. HSE 834, Box
0732 San Francisco, CA, 94143-0732,
USA http://www.keck.ucsf.edu/~craig office: 415-476-1762
(UCSF) cell: 510-708-6346
----- Original Message
----- From: "Dmitry Terez" <terez@SOUNDMATHTECH.COM> To:
<AUDITORY@LISTS.MCGILL.CA> Sent: Friday, January 16, 2004 1:21
PM Subject: Is correlation any good for pitch perception?
Dear Auditory List Members,
I would like to convey some
thoughts on the much-discussed subject
of
how human auditory system can use autocorrelation analysis for
pitch perception.
Are you really sure that our auditory system
uses autocorrelation at all? Has anybody seen it really happening in
the brain? As far as I understand (Forgive me if I am wrong), beyond
the cochlea not much is really known about the real mechanism behind an
exceptionally robust human pitch perception. I am not an auditory
scientist, but it just looks to me that the correct answer on your part
is “We do not know”.
I do think that correlation function has two
fatal drawbacks, as far as pitch detection is concerned (I am talking
about a classical
auto-
or cross-correlation function of a signal, that is, a
multiply-and- add type of operation, as defined in any textbook on
signal processing)
The first fatal drawback of correlation is
the abundance of
secondary
peaks due to complex harmonic structure of a signal. For some
real signals we are dealing with every day, such as speech, the
secondary peaks in the correlation function due to speech formants
(vocal
tract
resonances) are sometimes about the same height as the main
peaks
due
to signal periodicity (pitch).
The second fatal
drawback of correlation is its pitch strength (salience) property for
simple and complex tones. In other words,
the
main peaks in the correlation function computed for, e.g. a
simple sine wave, are too wide. Meanwhile, I would expect a simple tone
to cause the same or even stronger pitch sensation than a complex
tone with the same fundamental frequency.
I think that it would
be strange if evolution resulted in such a suboptimal mechanism of
perceiving sound periodicity.
As some of you may know, recently we
introduced a new revolutionary concept of pitch detection. It has
nothing to do with correlation (although one can see some similarity)
or spectrum of a signal. It
is
basically based on “unfolding” a scalar signal in
several
dimensions –
a concept of signal “embedding”, as it is called in nonlinear
and chaotic signal processing. The ICASSP paper and the Matlab demo
are available from http://www.soundmathtech.com/pitch
You can
also read our US patent application publication No. 20030088401 at
http://www.uspto.gov/patft
Although a purely digital implementation
is described, I can build a simple analog electro-mechanical device
(basically a mechanical part followed by a two-dimensional grid of
“neurons” for projecting an output) that is based on the same principle
and is exceptionally robust at detecting pitch.
My question is:
Can our auditory system use this type of processing for pitch
perception?
Is it possible to find some mechanism that can perform
this kind of processing, perhaps between the cochlea and the
brain?
I do not expect a quick answer. Please, take your time,
maybe next
10
years …
Also, I would like to add that although words
like “chaos theory”, “phase space” or “signal embedding” might seem not
relevant to your research on pitch perception, they are now, in fact.
This is an entirely new game…
Best Regards,
Dmitry E.
Terez, Ph.D.
SoundMath Technologies, LLC P.O. Box 846 Cherry
Hill, New Jersey, 08003 USA e-mail: dterez AT
soundmathtech.com
|