Subject: responses to relative phase in audition and vision posting From: Timothy Justus <tjustus(at)SOCRATES.BERKELEY.EDU> Date: Mon, 11 Nov 2002 14:54:53 -0800Thank you to everyone who responded to my email concerning relative phase in audition and vision. Here are the responses: My original post: I've been working on some studies examining possible parallels between the processing of spatial frequency and auditory frequency (e.g., Ivry and Robertson 1998 [Two Sides of Perception],) and the issue of phase information was raised to me recently; a colleague pointed out that while relative phase information seems to be unimportant when processing multiple auditory frequencies (e.g., computing pitch), it is very important in vision. If you look at an image which contains all of the same spatial frequencies at the same positions, but without being in-phase, the image is not coherent. My question is whether anyone has ideas about why this is the case. I'm starting to wonder if this results from the kind of information that phase might provide about the environment; i.e., perhaps whereas in vision phase information is important to interpreting the spatial scene correctly, this may not be as critical in audition? Does the indifference of pitch computation mechanisms to phase reflect a lack of informativity of this information about objects in the environment? Also, are other elements of audition besides pitch perception (e.g., timbre) more dependent on relative phase? ***** Bob Carlyon <bob.carlyon(at)mrc-cbu.cam.ac.uk> funny you should mention that.... The auditory system IS sensitive to phase between harmonics that interact within a single auditory filter (because phase alters the shape of teh waveform at the output of that filter) It is also sensitive to phase differences between envelopes: e.g if you apply AM at teh same rate to two different carriers (well separated in frequency), and delay the phase of teh modulator applied to one of them, subjects can detect this It is NOT sensitive to (carrier) phase differences between individual frequency components that are well separated so that they do not interact in a single filter output. Shihab Shamma and I have submitted a paper to JASA which states that there are three reasons for this (i) resolved partials produce peaks in teh travelling wave, around which there are dramatic phase transitions. So, different the neurons responding to the same partial do so at lots of different phases. To compare the responses to two partials, you'd need to know which neurons to compare with which (ii) two different partials have, by definition, different frequencies. Even when harmonically related (say 1st and 4th harmonics of a given F0) the peaks in the filter output to the lower component will co-incide (when in phase) with only a proportion of those to the higher component (1/4th in the above example). (iii) when the resolved harmonics have a moderately high frequency, the response to them will be temporally smoothed by the hair cell & later stages of teh auditory system. [Dr. Carlyon was also kind enough to pass on a preprint to me on this topic; interested parties may request one from him] ***** Eckard Blumschein <Eckard.Blumschein(at)e-technik.uni-magdeburg.de> While the input signal of vision is already parallel structured, audition has to cope with the need for transforming the structure from serial to parallel. Ohm's law of acoustics reflects this function even if Seebeck's objection was justified and Fourier transform is merely a poor approximation of physiology. Incidentally, physicists and mathematicans publically dicussed with me Fourier transform vs. Fourier cosine transform with respect to causality. I was correct. Only the latter is causal. There is no justification for the ambivalence and redundancy of FT. There is no reason for negative values of elapsed time, frequency, radius and wavenumber. On one hand it is true: Theoretically, amplitude and phase of a complex quantity are equally important as also are its real and imaginary values. On the other hand, one can look at the amplitude spectrum separately, and this might even be useful for measuring of a time span. Radar systems do so when applying the standard IFFT approach on stepped frequency chirps. Animal hearing is mainly designed for estimation of temporal distances and recognition of a coherent source. Cochlear tonotopy is perhaps the most simple solution therefore in that, it simply omitts phase. However, since place on partition is just an additional dimension, a diversified field of temporal structures is still remaining. At first it allows for some archaic functions (being unique to hearing) of auditory midbrain which are obviously not subject to the phase deafness. It is also the basis for cortical auditory analysis that seems to be pretty similar to corresponding analysis in vision. Simplifying corollary: Cochlea acts like a serial to parallel interface that provides an additional amplitude spectrum before cortical analysis. Measurement with tones must not be generalized to hearing as a whole. Paradox effects of phase and also of frequency components outside the audible range reveal that the idea of hearing as a frequency analyzer neglects fast interaural comparison as well as the sluggish cortical counterpart of visual perception. ***** Amanda Lauer <alauer(at)psyc.umd.edu> Masking of sounds can be affected by relative phase. For instance, threshold for tones embedded in harmonic complex maskers with identical long-term frequency spectra, but with different phase spectra is strongly affected by the starting phases of the components. A few recent papers: Lentz & Leek (2001). Psychophysical estimates of cochlear phase response: Masking by harmonic complexes. JARO, 2, 408-422. Oxenham & Dau (2001). Towards a measure of auditory-filter phase response. JASA, 110, 3169-3178. ***** Brad Libbey <gt1556a(at)mail.gatech.edu> I've had similar questions myself. For my thesis I created reverberation-like noise by randomizing the phase of reverberation. I did this by windowing 93 ms segments of reverberant speech, taking a fast Fourier transform, randomizing the phase, converting back to time domain, overlapping, and adding the segments. The anechoic speech signal is then added to the reverberation-like noise. Subjects in reverberation identified 76% of the words correctly and subjects in reverberation-like noise identified about 66% correctly. This is when the speech to reverberation ratio matches the speech to reverberation-like noise ratio. Another way of thinking of this data is that the speech to noise ratio has to be roughly 5 dB greater for the reverberation-like noise to match intelligibility scores. One possible reason for this is the temporal smearing that occurs within the time window due to the 93 ms window length. The other possible reason is related to your question. Does the auditory system have trouble dealing with the random phase? I asked around at a conference recently and the following was suggested. Relative phase is significant within a critical band but not across critical bands. This is partially backed up by research done by Traunmuller, H. (1987) The Psychophysics of Speech Perception, Chapter "Phase Vowels" Martinus Nijhoff, Hingham, MA, USA p 377-384. Also might try Patterson, R. (1987) "A pulse ribbon model of monaural phase perception" JASA 82 p 1560-1586. ***** Houtsma, A.J.M. <A.J.M.Houtsma(at)tue.nl> Part of your problem is that your statement about the unimportance of relative phase information in auditory processing is much too broad and actually incorrect. Se for, instance, Julius Goldstein's 1967 study in JASA on this topic. In a nutshell, it boils down to the fact that the auditory system is insensitive to relative phase as long as tone components are in different critical bands (i.e. are about 15 % apart in frequency). However, when frequencies are closer together and tones fall in the same critical band, your ear can easily detect relative phase changes. With respect to pitch perception, there appear to be two mechanisms, a strong one based on resolved components (this one is phase-insensitive) and another much weaker one based on unresolved components (this one IS sensitive to phase). See Houtsma and Smurzynski, JASA 1990). One reason for the difference in phase sensitivity between the visual and auditory system may be that the eye does not have a clear analogy to the ear's critical band. ***** John Hershey <jhershey(at)cogsci.ucsd.edu> In vision we talk about phases of an image - a whole array of signals - in the spatial domain, whereas the phase insensitivities of audition refer to the phases of an acoustic signal in the time domain. So in some sense it's apples and oranges, but you could compare phase sensitivities of vision and audition within either the spatial or temporal domain. You could also ask why it's apples and oranges, given that both phenomena (light and sound) behave like waves. The difference has something to do with wavelength. Regarding spatial phase: the relative phases of the spatial frequency components of a focused image will be important no matter what the modality because we are interested in locating objects in the world. If sound had a shorter wavelength -- and we lived in an acoustic world where things behaved in a light-like way -- we could perhaps have some sort of an acoustic lens and a high-resolution spatial sound retina of sorts. On the other hand, even with long wavelengths, reverberation, and only two sensors we still manage to use the time differences between the two ears (among other things) to form a spatial image in the brain, albeit at lower resolution than in vision. So if you were to look at the spatial frequency components of this image you would find that the relative phases of these spatial frequency components are important for locating sounds, even if the relative phases in the time domain are not. Regarding temporal phase, the usual explanation of why relative phase of resolved components in the audio time domain is relatively unimportant for many tasks is that reverberation/refraction scrambles the relative phases between components that are far apart in frequency, due to the long wavelength of sound among other things. They are still important for transients and for components that are nearby in frequency. As for vision, the visual system has such a different response to temporal signals that it would be difficult to compare to audition -- for instance we pick up different frequencies as color. At the level of the electromagnetic waves/photons vision is probably insensitive to phase relationships between different frequencies. However if we think of the brightness (amplitude envelope) over time as the signal of interest, then relative phase of the frequency components of this signal is likely important in the visual time domain. The spatio-temporal signature of the moving edge tends to be coherent - - if it weren't we would see the edges blur in a strange way. That said, if we think about the loudness envelope of an acoustic signal instead of the signal itself -- then the phase relationships are again important even across different temporal frequency bands.