[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: An Auditory Illusion
Dear List
The "oscillatory framework" is by no means universally accepted
by those of us in the ball game of computational modelling.
Further, it is by no means universally accepted that the work
of Singer et al constitutes "proof" for the existence of
"coherent oscillations" as the substrate of feature binding.
For example, it has not been sufficiently ruled out that the
"coherence" is caused by two cortical cells having a common
sub-cortical input. If a bar say, is moved over two
non-overlapping receptive fields at the same time, the temporal
structure of their input will be identical.
An alternative view is that binding is mediated by "channels"
which are tuned to detect modulation, so-called AM channels.
The existence of cells which have a well-defined temporal
modulation transfer function (tMTF) has been demonstrated in
both auditory and visual cortex. See for example,
Schreiner, C.E. and Urbas, J.V. (1988) Representation of AM in
the auditory cortex of the cat II. Hear. Res. 21, 227-241.
Hawken, M.J. et al (1996) Temporal-frequency selectivity in
monkey visual cortex. Vis. Neurosci. 13, 477-492.
What is particularly fascinating when one compares visual
(mostly motion sensitive cells, but including colour) and
auditory cortex is that the range of best modulation
frequencies (BMFs) is practically identical in each case, i.e.
approx. 0.5 Hz - 20 Hz. This should come as no surprise when we
consider that auditory rhythm and visual motion are just two
sides of the same coin. A very concrete example of this is
audio-visual binding when watching and listening to someone
speak.
The view that binding is mediated by some kind of AM transform
is strengthened by the fact that it is possible to construct a
working computational model of both sequential and simultaneous
grouping this basis. See for example,
Todd, N.P.McAngus (1996) An auditory cortical theory of
auditory stream segregation. Network : Computation in Neural
Systems. 7, 349-356.
Todd, N.P.McAngus (1996). Towards a theory of the principal
monaural pathway: pitch, time and auditory grouping. In W.
Ainsworth and S. Greenberg (Eds). Proceedings of the
International Workshop on The auditory basis of speech
perception. Keele, July, 1996. pp 216-221.
The idea that temporal information is represented in the form
of a kind of frequency space transform is given further impetus
when we consider that it may account for a whole range of other
phenomena, particularly to do with time perception. See for
example,
Todd, N.P.McAngus (1996) Time discrimination and AM detection.
J. Acoust. Soc. Am. 100(4), Pt. 2, 2752
Todd, N.P.McAngus and Brown, G.J. (1996) Visualization of
rhythm, time and metre. Artificial Intelligence Review 10,
253-273.
Todd, N.P.McAngus (1996). Towards a theory of the central
auditory system I: Architecture. In B. Pennycook and E.
Costa-Giomi (Eds.) Proceedings of the Fourth International
Conference on Music Perception and Cognition. Montreal, August,
1996. pp 173-178.
Todd, N.P.McAngus (1996). Towards a theory of the central
auditory system II: Pitch. In B. Pennycook and E. Costa-Giomi
(Eds.) Proceedings of the Fourth International Conference on
Music Perception and Cognition. Montreal, August, 1996. pp
179-184.
Todd, N.P.McAngus (1996). Towards a theory of the central
auditory system III: Time. In B. Pennycook and E. Costa-Giomi
(Eds.) Proceedings of the Fourth International Conference on
Music Perception and Cognition. Montreal, August, 1996. pp
185-190.
Todd, N.P.McAngus (1996). Towards a theory of the central
auditory system IV: Grouping. In B. Pennycook and E.
Costa-Giomi (Eds.) Proceedings of the Fourth International
Conference on Music Perception and Cognition. Montreal, August,
1996. pp 191-196.
However, to answer Al's original question such a model may also
cope with the situation of recognising two, or more, versions
of the same object, so long as they are separated by time,
space, freq., etc. The model is able to do this by applying a
central pattern recognition mechanism to the modulation
transform, but where there are a number of different versions
of the same *type* template. E.g. in the case of pitch, if one
includes in the training set a range of complex tones, then
there is no problem if there are present in the signal two
harmonic complexes with a different FO. Similarly in a more
advanced version of this scheme, one may imagine a training set
to include male and female voices, different accents etc.
Best wishes
Neil Todd