Subject: Re: An Auditory Illusion From: Neil Todd <todd(at)HERA.PSY.MAN.AC.UK> Date: Tue, 20 May 1997 21:43:46 +0100Dear List The "oscillatory framework" is by no means universally accepted by those of us in the ball game of computational modelling. Further, it is by no means universally accepted that the work of Singer et al constitutes "proof" for the existence of "coherent oscillations" as the substrate of feature binding. For example, it has not been sufficiently ruled out that the "coherence" is caused by two cortical cells having a common sub-cortical input. If a bar say, is moved over two non-overlapping receptive fields at the same time, the temporal structure of their input will be identical. An alternative view is that binding is mediated by "channels" which are tuned to detect modulation, so-called AM channels. The existence of cells which have a well-defined temporal modulation transfer function (tMTF) has been demonstrated in both auditory and visual cortex. See for example, Schreiner, C.E. and Urbas, J.V. (1988) Representation of AM in the auditory cortex of the cat II. Hear. Res. 21, 227-241. Hawken, M.J. et al (1996) Temporal-frequency selectivity in monkey visual cortex. Vis. Neurosci. 13, 477-492. What is particularly fascinating when one compares visual (mostly motion sensitive cells, but including colour) and auditory cortex is that the range of best modulation frequencies (BMFs) is practically identical in each case, i.e. approx. 0.5 Hz - 20 Hz. This should come as no surprise when we consider that auditory rhythm and visual motion are just two sides of the same coin. A very concrete example of this is audio-visual binding when watching and listening to someone speak. The view that binding is mediated by some kind of AM transform is strengthened by the fact that it is possible to construct a working computational model of both sequential and simultaneous grouping this basis. See for example, Todd, N.P.McAngus (1996) An auditory cortical theory of auditory stream segregation. Network : Computation in Neural Systems. 7, 349-356. Todd, N.P.McAngus (1996). Towards a theory of the principal monaural pathway: pitch, time and auditory grouping. In W. Ainsworth and S. Greenberg (Eds). Proceedings of the International Workshop on The auditory basis of speech perception. Keele, July, 1996. pp 216-221. The idea that temporal information is represented in the form of a kind of frequency space transform is given further impetus when we consider that it may account for a whole range of other phenomena, particularly to do with time perception. See for example, Todd, N.P.McAngus (1996) Time discrimination and AM detection. J. Acoust. Soc. Am. 100(4), Pt. 2, 2752 Todd, N.P.McAngus and Brown, G.J. (1996) Visualization of rhythm, time and metre. Artificial Intelligence Review 10, 253-273. Todd, N.P.McAngus (1996). Towards a theory of the central auditory system I: Architecture. In B. Pennycook and E. Costa-Giomi (Eds.) Proceedings of the Fourth International Conference on Music Perception and Cognition. Montreal, August, 1996. pp 173-178. Todd, N.P.McAngus (1996). Towards a theory of the central auditory system II: Pitch. In B. Pennycook and E. Costa-Giomi (Eds.) Proceedings of the Fourth International Conference on Music Perception and Cognition. Montreal, August, 1996. pp 179-184. Todd, N.P.McAngus (1996). Towards a theory of the central auditory system III: Time. In B. Pennycook and E. Costa-Giomi (Eds.) Proceedings of the Fourth International Conference on Music Perception and Cognition. Montreal, August, 1996. pp 185-190. Todd, N.P.McAngus (1996). Towards a theory of the central auditory system IV: Grouping. In B. Pennycook and E. Costa-Giomi (Eds.) Proceedings of the Fourth International Conference on Music Perception and Cognition. Montreal, August, 1996. pp 191-196. However, to answer Al's original question such a model may also cope with the situation of recognising two, or more, versions of the same object, so long as they are separated by time, space, freq., etc. The model is able to do this by applying a central pattern recognition mechanism to the modulation transform, but where there are a number of different versions of the same *type* template. E.g. in the case of pitch, if one includes in the training set a range of complex tones, then there is no problem if there are present in the signal two harmonic complexes with a different FO. Similarly in a more advanced version of this scheme, one may imagine a training set to include male and female voices, different accents etc. Best wishes Neil Todd