[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is correlation any good for pitch perception?
Title: Re: Is correlation any good for pitch
perception?
Peter, Bertrand, Craig, Dmitry, Shihab, and all,
I've been out of auditory work for quite a few years now, but
still lurk on this list.
This discussion of pitch and auto-correlation sounds so much like
what we heard one and two decades ago that's it nostalgic, in a funny
kind of way.
Peter, thanks for carrying the torch.
Dick
ps. the first time I talked about auditory correlograms at a Navy
workshop on AI and bionics, J.C.R. Licklider fell asleep against the
back wall of the room. When I mentioned his name and his theory,
his wife elbowed him in the ribs to wake him up. I think his
interest was more on the AI side in those days (c. 1984).
At 10:27 PM -0800 01/19/2004, Craig Atencio wrote:
Dear Peter,
You are completely correct in pointing
out my faulty memory. You descibed dominance regions for pitch but not
unresolved harmonics in the second of your 1996 papers. I was thinking
of a talk Bertrand gave out here in Berkeley late last year. The
conclusion that I claimed he stated was correct, though it was not in
the context of autocorrelation models. The move toward using Shamma's
ideas is also correct and was stated in the Berkeley talk and a talk
given near Paris last year.
The main point, of course, was to request
a longer article on the state-space embedding technique described by
Dmitry. It is an interesting idea that needs more detail made
available. Especially the idea of how it could be implemented in a
neural system.
Thanks also for your detailed analysis of
autocorrelation results in your mailing.
-Craig
---------------------
Craig Atencio
Department of Bioengineering UCSF/UCB
W.M. Keck Center for Integrative Neuroscience UCSF
513 Parnassus Ave.
HSE 834, Box 0732
San Francisco, CA, 94143-0732, USA
http://www.keck.ucsf.edu/~craig
office: 415-476-1762 (UCSF)
cell: 510-708-6346
----- Original Message -----
From: Peter
Cariani
To: AUDITORY@LISTS.MCGILL.CA
Cc: Craig
Atencio
Sent: Monday, January 19, 2004 9:37 PM
Subject: Re: Is correlation any good for pitch
perception?
Dear Craig and Eckhard
Craig, which of our 1996 results are you thinking of
"that overestimate pitch salience of unresolved
harmonics"?
I can't think of any offhand -- do you have some of our results
confused with those generated by computer models?
P. A. Cariani and B. Delgutte, "Neural correlates of the pitch of
complex tones. I. Pitch and pitch salience. II. Pitch shift, pitch
ambiguity, phase-invariance, pitch circularity, and the dominance
region for pitch.," J. Neurophysiology, vol. 76, pp.
1698-1734, 1996.
A difficulty with estimating pitch saliences has been the relative
dearth of data on salience per se (Fastl's papers
notwithstanding).
We stated that the peak-to-mean ratio in population-wide all-order
interval distribution
qualitatively corresponded to pitch salience. The more general concept
is
that the salience of the pitch is related to the fraction of intervals
related to
a particular periodicity (n/F0, where n= 1,2,3,....) amongst all
others.
We found that the salience of high-Fc AM tones just outside the pitch
existence
region was near 1 and the salience of AM noise was near 1.4. The
ordering of the
saliences did correspond well to the ordering in Fastl's paper and in
the
rest of the literature that was then available (pitches of harmonics
in the dominance
region have higher saliences than higher ones, pitches of resolved
harmonics 3-5 had
higher saliences than unresolved ones 6-12).
Population-interval models (I speak of the results of my own
simulations)
predict lower saliences as carrier frequencies increase
(because of weaker phase locking to the carrier, a smaller fraction of
ANFs being
driven due to the asymmetry of tuning curves, and the more dispersed
nature of
intervals associated with envelopes rather than individual
partials).
They also predict lower saliences with harmonic number. If
you have a logarithmic
distribution of CFs, the higher the harmonic number, the fewer the
fibers that are
excited (proportionally) by a single harmonic bracketed by others.
If you're talking about low F0's (the salience of F0=80 Hz harmonic
complexes in our paper),
subjectively, the 80 Hz pitch of those stimuli is strong (these had
components at the
fundamental, in contrast to Carlyon's stimuli). Certainly if you go
over to the piano and play a melody in the register near 80 Hz, the
pitches aren't qualitatively weaker than an octave or two above
it.
This is a subjective observation.
It is true, though, that the early models would not have accounted
well for saliences of
very low pitches (< 50-60 Hz), because the early models did not
discount longer intervals.
One of subsequent evolutions of pitch models in the last 10 years has
been the realization
that lower limit of pitch has implications for interval-based models,
e.g.
D. Pressnitzer, R. D. Patterson, and K. Krumboltz, "The lower
limit of melodic pitch," J. Acoust. Soc. Am., vol. 109, pp.
2074-2084, 2001.
This was driven, I think, in part from Carlyon & Shackleton's
(1994) work on low-F0
periodicities in high-Fc channels (high harmonic numbers), which has
led some of us to
hypothesize that high CF channels may have shorter interval analysis
windows than lower-CF ones.
It may well be the case that there are some differences in processing
of intervals according to CF, but this does not change the core
hypothesis that there is a global temporal representation of
periodicities below around 4 kHz. (I believe that this strong, precise
level-invariant representation coexists with a much weaker and coarser
(rate) place representation that covers the range of cochlear
resonances (50-20,000 Hz; i.e. a duplex model much like
Licklider's).
You have to realize that the earlier studies were simply trying to
predict pitch on the basis of interval patterns. We didn't deal with
questions around the fringes of the pitch existence region. To deal
with these questions, one needs to grapple with the length of the
interval analysis windows (what is the longest interval that is
analyzed?). An autocorrelation that encompasses indefinitely long lags
as indefinitely precise frequency resolution (like a vernier
principle) -- we know the frequency resolution of the auditory system
is quite fine, but nevertheless limited. Goldstein and Srulowicz
assumed first order intervals, which in effect produces an exponential
window, but there are many problems with first-order intervals (they
are rate-dependent -- pitch representations would shift as SPLs and
firing rates increased; the other major problem comes from
interference when one has 2 harmonic complexes (say n= 1-6) of
different F0's -- if they are 20% apart in frequency, their pitches do
not obliterate each other in the manner that would be expected if the
representations were based on first-order intervals).
"Higher-order peaks" are necessary to account for hearing
multiple concurrent F0's. This interference is a big problem for
first-order intervals and for central representations of pitch that
rely on bandpass MTF's.
Recently Krumbholz, Patterson, Nobbe, & Fastl have recently probed
the form of the (putative) interval analysis windows:
[2] K. Krumbholz, R. D. Patterson, A. Nobbe, and H. Fastl,
"Microsecond temporal resolution in monaural hearing without
spectral cues?," J Acoust Soc Am, vol. 113, pp. 2790-800,
2003.
These refinements of interval models appear to be capable of handling
decline of salience of resolved
and unresolved harmonics at both ends of the spectrum. For low
frequencies, salience is limited by
the length/shape of the interval analysis window; for high
frequencies, it is limited by the factors outlined above.
AUTOCORRELATION: REPRESENTATIONS & COMPUTATIONS
=======================================================================
Are you really sure that our auditory system uses
autocorrelation at all? (Terez)
"There are indeed at best scant indications for autocorrelation
merely inside brain." (Eckhard)
We have to be clear about neural representations and analyses
("computations").
It is abundantly clear that the all-order interspike interval
distributions have forms that resemble stimulus autocorrelation
functions in essential ways, up to the frequency limits of
phase-locking.
A temporal, interval-based representation of periodicity and spectrum
exists at early stages
of auditory processing, at least up to the level of the midbrain.
There are tens if not hundreds
of papers in the auditory literature that support this.
The mechanism by which this information is utilized/analyzed by the
auditory system is
unknown, and I agree that the evidence for neural autocorrelators per
se (a la Licklider)
is quite scant. However, the evidence for central rate-place
representation of pitch
F0-pitch and low frequency pure tone pitches as well) is also very
weak -- I see no
convincing physiological evidence for harmonic templates (BF's <<
5 kHz) or of
robust physiological resolution of harmonics higher than the second.
We want to see
a representation of lower-frequency sounds (< 4kHz) that is
precise, largely level-invariant,
capable of supporting multiple objects (two musical instruments
playing different notes),
and that accounts for the pitches of pure and complex tones.
A central rate-place coding of pitch is not out of the question -- I
can imagine
ways that we might have missed some kind of covert, distributed
representation --
but I can also imagine similar (but more elegant) possibilities for
central temporal representations,
and there is no compelling reason to think that the central codes must
be rate-based.
Cortical neurons don't behave anything like rate integrators --
its evident from the nature of single unit responses that some other
functional information-processing principle must be operant; we
desperately need fresh ideas -- alternative notions of information
processing.
It's premature to rule anything out yet.
We understand the cortical correlates of very few auditory percepts
with any degree
of confidence, such that we can say that it is this or that kind of
code. Even in vision,
where there are orders of magnitude more workers on both cortical and
theoretical
fronts, as far as I can see this is also the case -- they do not have
a coherent
theory of how the cortex works -- how the information is represented
and
analyzed -- they cannot explain how we (and most animals) distinguish
triangles from circles
or why images disappear when they are stabilized on the retina.
I think the central problem in auditory neurophysiology is to
determine what becomes of this
superabundant, high quality, invariant, interval information. Laurel
Carney
and Shihab Shamma have proposed models for utilizing phase
information, but
it remains to be seen whether there is compelling physiological
evidence for
these models. It's also not yet clear to me how these models behave in
terms of pitch.
I have been working on time-domain strategies that bypass the need to
compute
autocorrelations explicitly (pitch is notoriously relative, which is
not consistent with
explicit absolute estimation mechanisms), but I do not yet see any
strong positive physiological evidence for these either.
Amidst all this equivocation about mechanisms, let us not forget what
we know about early representations, i.e. that the interval-based
"autocorrelation-like" representation does exist at the
levels of the auditory nerve, cochlear nuclei, and midbrain. Even at
the midbrain, I think it is likely that the interval-based
representation
is of higher quality and greater reliability than those based on
bandpass MTFs or on rate-place profiles.
I think it's likely that, whatever the mechanism turns out to be, it
will involve this interval information and it will also be tied in
closely with scene analysis mechanisms (harmonicity grouping).
If one is fairly certain that the information is temporally-coded to
begin with, then one looks first for temporal processing
mechanisms.
--Peter Cariani
Peter Cariani, PhD
Eaton Peabody Laboratory of Auditory Physiology
Massachusetts Eye & Ear Infirmary
243 Charles St., Boston, MA 02114 USA
On Friday, January 16, 2004, at 05:34 PM, Craig Atencio wrote:
Dear Dmitry,
My understanding recently was that autocorrelation may not be the
best
measure of periodicity pitch because it performs too well. Papers
in
1996 by Cariani and Delgutte showed that pitch salience was
overestimated for unresolved harmonics. For most everything else
the
model worked quite well. Note that these studies were a
neurophysiological test of Licklider's original autocorrelation
idea,
where he gave a basic schematic of how autocorrelation might be
implemented in a neural system. I recently heard a talk where
Delgutte
said that he was moving more in the direction of Shamma's earlier
work
based on spatio-temporal representations of auditory signals.
I did read your earlier ICASSP paper and took a look at the Matlab
files on the website. Unfortunately, the code is not available for
viewing and the ICASSP paper lacks some detail. I, for one, was
really
intrigued by your idea, so I kindly suggest that you write a
longer
paper and submit it to JASA. The editors there are always
interested
in pitch, as are the reviewers and readers. (I believe that last
year
Pierre Divenyi proposed this as well.)
You also mentioned that a basic analog system could implement your
idea. Including that in the JASA paper would be great too. It would
be
helpful to see that so we could determine which neural center, if
any,
might be able to implement your idea.
Best wishes,
Craig
---------------------
Craig Atencio
Department of Bioengineering UCSF/UCB
W.M. Keck Center for Integrative Neuroscience UCSF
513 Parnassus Ave.
HSE 834, Box 0732
San Francisco, CA, 94143-0732, USA
http://www.keck.ucsf.edu/~craig
office: 415-476-1762 (UCSF)
cell: 510-708-6346
----- Original Message -----
From: "Dmitry Terez" <terez@SOUNDMATHTECH.COM>
To: <AUDITORY@LISTS.MCGILL.CA>
Sent: Friday, January 16, 2004 1:21 PM
Subject: Is correlation any good for pitch perception?
Dear Auditory List Members,
I would like to convey some thoughts on the much-discussed subject
of
how human auditory system can use autocorrelation analysis
for pitch
perception.
Are you really sure that our auditory system uses autocorrelation
at
all?
Has anybody seen it really happening in the brain? As far as I
understand
(Forgive me if I am wrong), beyond the cochlea not much is really
known about the real mechanism behind an exceptionally robust
human
pitch perception. I am not an auditory scientist, but it just
looks
to me that the correct answer on your part is "We do not
know".
I do think that correlation function has two fatal drawbacks, as
far
as pitch detection is concerned (I am talking about a classical
auto-
or cross-correlation function of a signal, that is, a
multiply-and-
add type of operation, as defined in any textbook on signal
processing)
The first fatal drawback of correlation is the abundance of
secondary
peaks due to complex harmonic structure of a signal. For
some real
signals we are dealing with every day, such as speech, the
secondary
peaks in the correlation function due to speech formants (vocal
tract
resonances) are sometimes about the same height as the
main peaks
due
to signal periodicity (pitch).
The second fatal drawback of correlation is its pitch strength
(salience) property for simple and complex tones. In other words,
the
main peaks in the correlation function computed for, e.g.
a simple
sine wave, are too wide. Meanwhile, I would expect a simple tone
to
cause the same or even stronger pitch sensation than a complex
tone
with the same fundamental frequency.
I think that it would be strange if evolution resulted in such a
suboptimal mechanism of perceiving sound periodicity.
As some of you may know, recently we introduced a new
revolutionary
concept of pitch detection. It has nothing to do with correlation
(although one can see some similarity) or spectrum of a signal. It
is
basically based on "unfolding" a scalar signal in
several
dimensions -
a concept of signal "embedding", as it is called in
nonlinear and
chaotic signal processing.
The ICASSP paper and the Matlab demo are available
from
http://www.soundmathtech.com/pitch
You can also read our US patent application publication No.
20030088401
at http://www.uspto.gov/patft
Although a purely digital implementation is described, I can build
a
simple analog electro-mechanical device (basically a mechanical
part
followed by a two-dimensional grid of "neurons" for projecting
an
output) that is based on the same principle and is exceptionally
robust at detecting pitch.
My question is: Can our auditory system use this type of
processing
for pitch perception?
Is it possible to find some mechanism that can perform this kind
of
processing, perhaps between the cochlea and the brain?
I do not expect a quick answer. Please, take your time, maybe next
10
years Š
Also, I would like to add that although words like "chaos
theory", "phase space" or "signal embedding" might seem not
relevant
to your research on pitch perception, they are now, in fact. This
is
an entirely new gameŠ
Best Regards,
Dmitry E. Terez, Ph.D.
SoundMath Technologies, LLC
P.O. Box 846
Cherry Hill, New Jersey, 08003
USA
e-mail: dterez AT soundmathtech.com