Dear Valeriy and List,
(Sorry if you get this twice. Apparently the AUDITORY listserv
didn't like my new email address and may or may not have sent
this message to the list.)
I don't think that the question about how many sound sources
people can perceive in a mixture has a simple answer.
First of all, it depends on what you mean by "perceive". If you mean how many sound sources they can report, then it depends on how much time they are given. The longer they listen, the more they can report. This could mean that they can pay close attention to only one sound at a time, but can shift their attention from one source to another. If you mean, "How many could they report if given a sample of each one in turn and asked whether they could detect it in the mixture, given all the time they needed?", the question becomes one about blending and segregation at a very basic level. (Let's call this last method, "matching a standard"). Often I haven't been able to hear a sound in a mixture until I knew what it was.
Assuming that one is using the "matching a standard" method,
then one's success will depend on what the sounds are and how intense they are. Obviously a weak sound may be hard to
detect -- it may be masked (psychoacoustically or
informationally) by the others. Even if this is not true, the similarity of the component sound sources plays an important role. Those researchers who have claimed that only a small number of sources (3 or 4) can be detected are all referring to sets of sounds that resemble one another, such as multiple talkers or singers, or rhythmically playing instruments.
Consider the following set of sounds:
- a person talking
- randomly spaced hits on a bass drum (greatly attenuated)
- an ambulance siren
- jangling of a set of house keys,
- a pure tone playing Morse code,
- a person typing on an electric typewriter.
- the sound of a motorcycle whizzing by (greatly attenuated)
As long as the intensities were balanced appropriately, I think
you would eventually detect all of them by the method of matching a standard.
As many of the list members know, I believe that there is an
early perceptual stage of assigning of links among the parts of the incoming mixture prior to further processing by the mechanisms that we call attention. If what we are asking about is whether this pre-attentive process has some limit concerning the number of discrete subsets (potential streams) it can form, we would have to observe its operation without any contribution from attention. I believe that
this is impossible using the standard methods of psychoacoustics. Rather, it has to be addressed using a physiological approach. Some beginnings toward doing this have been carried out by Elyse Sussman and by Claude Alain (working independently), and there may be others who I don't know about.
By the way, I referred to the output of the pre-attentive
mechanism as "potential" streams because there is good reason to believe that top-down processes play a big role in determining the actually heard streams.
Sorry the answer couldn't have been simpler.
Al
---------------------------------------------
Albert S. Bregman,
Emeritus Professor
Psychology Dept., McGill University
1205 Docteur Penfield Avenue
Montreal, Quebec
Canada H3A 1B1
Office:
Voice: +1 (514) 398-6103
Fax: +1 (514) 398-4896
---------------------------------------------.
----- Original Message -----
From: "Valeriy Shafiro" <Valeriy_Shafiro@RUSH.EDU>
To: <AUDITORY@LISTS.MCGILL.CA>
Sent: Friday, April 30, 2004 3:26 PM
Subject: Re: Computational ASA -- how many sources can humans
perceive?
I would like to ask a further question: Do we, in fact, know
how many
independent sound sources in a mixture humans can perceive?
Thus far I
know of only one research report where human listeners were
asked to
identify sound sources in a recorded "real-world" sound mixture
(Ellis, D.
P. (1996). Prediction-driven computational auditory scene
analysis). We
have been talking about this issue with Brian Gygi, and from
the few
related reports that Brian found, it appears that humans may
not be that
good in simultaneous perceiving independent sound sources. For
instance,
Jennifer Tufts and Tom Frank J. Acoust. Soc. Am. 101 , 3107
(1997) found
that the accuracy of judging the number of talkers in a
multitalker mixture
drops considerably when there are more than 3 talkers. There
is also a
report by David Huron (Music Perception, Vol. 19, No. 1 (2001)
pp. 1-64.,
or on-line
http://www.music-cog.ohio-state.edu/Huron/Publications/huron.voice.leading.html
) that estimating the number of musical lines in
polyphonic music worsens considerably after 3. Some anecdotal
evidence
for this limit also comes from movie sound effect designers.
This is a
citation from Walter Murch, a renown sound effect artist:
"There is a rule
of thumb I use which is never to give the audience more than
two-and-a-half
things to think about aurally at any one moment. Now, those
moments can
shift very quickly, but if you take a five-second section of
sound and feed
the audience more than two-and-a-half conceptual lines at the
same time,
they can't really separate them out. There's just no way to do
it, and
everything becomes self-canceling." (cited from
http://www.filmsound.org/murch/waltermurch.htm)
Any thoughts, comments, and references relevant to this issue
are
appreciated.
-------------------------------------------------------------
Valeriy Shafiro
Communication Disorders and Sciences
Rush University Medical Center
Chicago, IL
office (312) 942 - 3298
lab (312) 942 - 3316
email: valeriy_shafiro@rush.edu