[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sources & terms



Dear list members,

The issue of auditory streaming is very interesting to me and is playing an
increasingly larger role in my research.  However, the three examples below
and Dr. Bregman's excellent points about methods demonstrate a fundamental
problem with the concept.  The best way to describe the problem is by
analogy to the visual world, where the concepts of objects and features
have (I believe) more agreed-upon meanings.  In both vision and audition,
the physical signals are the result of events or objects that the perceiver
is trying to reconstruct.  The "stream" or the "object" is a mental
construct of the perceiver.  I will use the examples given to extend my point:


(1) An orchestral recording played through one loudspeaker. There is
one "sound" source, but the signal is complex. This is possibly the
most difficult to deal with.
This is a "stream" in the way that a photograph is a visual object.  Based
on the listener's attentional state and voluntary allocations of processing
power it can stay a single object or be broken down into components - some
of which can become smaller objects.  The poor resolution of the
presentation mechanism increases the difficulty of interpreting the picture
as multiple objects reflecting light in space or the recording as multiple
objects creating sound across time.

(2) An orchestral recording in stereo (X/Y mic technique). There are
two sound sources and the "ear" (?) / "mind" (?) is assisted in
decoding by (apparent) spatialization of (apparent) sources. Complex.
The analogy here is to a picture (or perhaps a movie) rendered in as
simulated 3D.  The entire recording is an object being presented to the
perceiver, but the increased resolution allows the perceiver to overcome
informational and energetic masking (assuming stereo playback) to a greater
degree.

(3) An orchestral recording recorded onto a 60 channel medium and
played back through 60 loudspeakers. At the 'individual level', the
least complex in some ways.
This is like the amusement ride where a movie is back-projected onto the
walls of a circular room.  By separating the sources in space,
informational and energetic masking are reduced still
further.  Nonetheless, the 'streams' coming out of the sixty speakers are
still nothing more than source material for the listener to use to recreate
an idea of what the actual physical events and objects were that created
the sounds.

The original question was "How many concurrent streams can be perceived
simultaneously?".  I think the visual analogy is again useful.  How many
visual objects can be perceived?  Obviously, the answer depends on how you
test it and whether or not memory counts as part of simultaneous
perception.  For a sound (or light) input created by the combination of
multiple sources, the ability of a perceiver to process it will depend
heavily on the amount of time allocated and the task they must perform.  An
excellent paper on visual attention (Shiffrin et al., 1976) showed that
observers could detect a brief visual target in any one of 49 spatial
positions when they were not cued which position to watch as when they were
told which one it might appear in.  A letter identification task reduced
the number of positions to 9.  Is this what we mean by perceiving a stream
or an object?

In general, I think that stream segregation is an excellent description of
how humans make sense of sound input, but for answering scientific
questions we must be very precise about exactly what we mean.

Erick Gallun
Postdoctoral Researcher
Hearing Research Center
Boston University