[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some limitations of the pitch-based grouping



On Sunday, January 18, 2004, at 08:25 AM, chen-gia tsai wrote:
It is widely accepted that the pitch sensation is important in auditory grouping. The fact that there is no similar principle of the pitch-based grouping in visual sensation allows me to think about the mechanical principle underlying the pitch-based grouping in audition.

I'm wary of evolutionary psychology when it invokes what seem to be adaptationist just-so stories.
The ecological arguments often are narrowly construed; they miss the necessity for flexible, general-purpose sensory capabilities (because of the high variability of appearances). Yes biological generators are often periodic, but periodicity detection is simply a special case of recognizing a pattern when it recurs in your inputs, and this is the key to learning. Neural systems self-organize to detect invariances and regularities in their sensory inputs that allow successful prediction and control of the external world.

One problem in making (dis)analogies between vision and audition is that inevitably there are tacit
neural coding assumptions about how both systems work, but on the whole neither system is particularly well-understood yet. The first question is what is the visual analogue of pitch?

One might argue for spatial frequency, since there is there is a missing fundamental phenomenon in vision:
S. T. Hammett and A. T. Smith, "Temporal beats in the human visual system," Vision Res, vol. 34, pp. 2833-2840, 1994. There are bandpass MTF visual and auditory units with comparable tunings. in the cortex, and
phase-locking to low modulation frequencies (0-30 Hz and above). I would think if you drifted 2 sets of harmonically-related gratings at different speeds, you would see two patterns moving against each other.
Does anyone know whether there is grouping by common flicker frequency?


There are also (spatial) autocorrelation theories of visual form that parallel (temporal) autocorrelation models of pitch: W. R. Uttal, An Autocorrelation Theory of Form Detection. New York: Wiley, 1975

One might argue that formation and segregation of forms through common/relative movement is very similar to
formation and separation of auditory objects (voices) by F0, since both processes involve building up
invariant patterns. P. Cariani, "Neural timing nets," Neural Networks, vol. 14, pp. 737-753, 2001 If you make a movie of two vowels with different F0's, the patterns move relative to each other. In vision the patterns may be spatial patterns of temporal correlations, [1] S.-H. Lee and R. Blake, "Visual form created solely from temporal structure," Science, vol. 284, pp. 1165-1168, 1999; in audition they are temporal patterns of spikes.

In both audition and vision, the same neural channels (retinotopic and cochleotopic) can participate in representing multiple concurrent objects (2 visual transparencies in relative motion; 2 harmonic complexes with diff F0s). In each case, it seems that there must be sorting (of which spikes go with which object) by the temporal patterns of spikes in the channels.

Despite their apparent differences, we should not too quickly dismiss the possibility of deep underlying commonalities between the two modalities.

--Peter Cariani