[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speaker phones and listening in reverbation

Dear Auditory List,

I would like to point people to an article of some years ago that addressed
this question with a signal processing method, designed to deal with this

 author = {Allen, J. B. and Berkley, D. A. and Blauert, J.},
   title = {A multimicrophone signal-processing technique to remove room reverberation
        from speech signals},
   journal = JASA,
   volume = {62},
   pages = {912-915},
   month = oct,
   year = {1977}

If you want to listen to the output of the process, you can download the file from
The file is: talk.wav
Sorry I dont have a name for this web site (yet).

It has three stereo samples, played as follows:
O=original   P=processed
L=left channel  R=right channel  M=monoral

sample   |   Right      Left
     1        |    OR         OL
     2        |    OR         OR
     3        |    PM         PM

The speech was recorded with mics in the someone's ears.
It was then processed to reduce the perception of reverberation
using a two-channel Wiener filter that attempted to removed sounds
that were not correlated, as a function of time and frequency.
This processing is describe in the paper.

The first should sound natural. The second, as describe by
Koenig (1950), will sound "rain-barrel like." The third is
the mono processed sound.

You MUST listen in stereo mode, with earphones, or else
you will not hear the effect. For example, listen to just left or
right, and the first two sound identical.

Bradley Wood Libbey wrote:

> James and list,
>         My dissertation research started with the question of speaker
> phones and how to improve the quality of speech on such telecommunication
> devices.  I suspect three reasons for the decrease in intelligibility:
> reverberation, bandwidth of phone lines, and single channel as opposed to
> binaural, Nabelek(82) and Eisenberg (00).
>         I suspect the tube analogy is fairly accurate, why aren't there
> better speaker phones?  Most telephone lines are band limited single
> channel, this is simply status quo.  As for the reverberation, signal
> processing techniques are just recently capable of removing echos when the
> original signal is known (the chips are very expensive), but not when the
> speech is unknown.  In order to remove reverberation I first looked at
> traditional signal processing techniques to remove echoes, microphone
> arrays are costly, cepstral processing Bogert (63) looks promising but
> fails with a larger number of echoes, I also looked into some other
> techniques in the literature that had limited results.  (I have references
> if interested.)  What I finally came to was based on the ideas that I have
> been hearing in this thread of e-mails, that humans don't notice echoes
> when listening binaurally in a reverberant room.  Perhaps binaural
> neurological processes were responsible for dereverberation.  Could these
> processes be modeled? Or are we simply capable of picking up enough
> information to understand the speech and ignore the reverb?
>         To investigate this possibility I moved away from speech quality
> and studied intelligibility in a way similar to how you described using
> your Walkman headphones, except I did use good microphones and tried to
> eliminate the frequency response of the measurement equipment.  I looked
> at only reverberation, no additive noise, no competing speakers, and full
> bandwidth (60-22kHz).  Work has already been done in this area, Nabelek
> (82), but to gather more knowledge I have done some studies that consider
> some differences of monaural and binaural listening to real and simulated
> reverberation considering interaural time difference, interaural level
> differences, and spectral weighting.  What I've found so far is in
> agreement with Nabelek's findings, that the binaural speech
> intelligibility advantage without competing noise sources in reverberation
> is relatively small, < 5 % difference in intelligibility for normal
> hearing listeners.  (I plan on presenting my results at the next ASA
> conference, and am in the process of writing up these results for
> publication, sorry they aren't done yet.  I also intend to do some quality
> testing in the future.)
>         At this point, for no competing noise sources I have NOT shown
> that binaural listening makes great improvements in intelligibility.
> Furthermore my experimental conditions have not shown that the pinna and
> interaural level differences have an effect on intelligibility. (see
> Bronkhorst (00) for some disagreement with my last statement.)
>         Now back to the speaker phones, why do they sound so bad? and why
> are they less intelligible?  I suspect that the decrease has a lot to do
> with reverberation (direct to reverberation ratios and all that) a little
> to do with binaural listening (when only one sound sources exist), and a
> lot to do with the bandwidth. The quality will be affected differently,
> Eisenberg (98).  If speaker phones were full bandwidth, then in the
> absence of competing sound sources I don't see that a binaural phone would
> offer great improvements in intelligibility over a single channel phone,
> little more than a summation of an appropriately delayed version of each
> signal.  (see Koenig (50) for disagreement on this point.)  However the
> argument might not hold for reduced bandwidth.
>         Is there any research that directly links single echo suppresion
> to reverberation suppression?  Nabelek (89) and Bronkhorst (00) both have
> tested and suggested that the reverberation acts as a masker, an area ripe
> for research. I graduate in a year, anyone have a post-doc position? :)
> Brad Libbey
> Graduate Student, Georgia Institute of Technology
> Bogert, Bruce P., M. J. R. Healy, and John W. Tukey. (1963). "The
> Quefrency Analysis of Time Series for Echoes: Cepstrum,
> Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking." Proceedings of
> the Symposium on Time Series Analysis. Ed. Murray Rosenblatt. New York, NY:
> John Wiley and Sons.
> Bronkhorst, Adelbert W.  (2000).  "The cocktail part phenomenon: A review
> of research on speech intelligibility in multiple-talker conditions."
> Acustica, 86: 117-128.
> Eisenberg, Laurie S., D. D. Dirks, S. Takayanagi, and A. S. Martinez.
> (1998).  "Subjective judgments of clarity and intelligibility for filtered
> stimuli with equivalent speech intelligibility index predictions."  J.
> Speech, Language, Hearing Res., 41: 327-339
> Eisenberg, Laurie S., Robert V. Shannon, Amy Schaefer Martinez, John
> Wygonski, and Arthur Boothroyd.  (2000).  "Speech recognition with reduced
> spectral cues as a function of age." J. Acoust. Soc. Am. 107: 2704-10.
> Koenig, W.  (1950).  "Subjective effects in binaural hearing."  J.
> Acoust. Soc. Am., 22: 61-62.
> Nabelek, Anna K. Nabelek, Tomasz R. Letowski, and Frances Tucker. (1989).
> "Reverberant overlap- and self-masking in consonant identification."  J.
> Acoust. Soc. Am., 86: 1259-1265.
> Nabelek, Anna K., and Pauline K. Robinson. (1982). "Monaural and Binaural
> Speech Perception in Reverberation for Listeners of Various Ages." J.
> Acoust Soc. Am. 71: 1242-1248.

Jont B. Allen
AT&T Labs-Research, Shannon Laboratory, E161
180 Park Ave., Florham Park NJ, 07932-0971
973/360-8545voice, x7111fax, http://www.research.att.com/~jba