[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speaker phones and listening in reverbation



James and list,

        My dissertation research started with the question of speaker
phones and how to improve the quality of speech on such telecommunication
devices.  I suspect three reasons for the decrease in intelligibility:
reverberation, bandwidth of phone lines, and single channel as opposed to
binaural, Nabelek(82) and Eisenberg (00).

        I suspect the tube analogy is fairly accurate, why aren't there
better speaker phones?  Most telephone lines are band limited single
channel, this is simply status quo.  As for the reverberation, signal
processing techniques are just recently capable of removing echos when the
original signal is known (the chips are very expensive), but not when the
speech is unknown.  In order to remove reverberation I first looked at
traditional signal processing techniques to remove echoes, microphone
arrays are costly, cepstral processing Bogert (63) looks promising but
fails with a larger number of echoes, I also looked into some other
techniques in the literature that had limited results.  (I have references
if interested.)  What I finally came to was based on the ideas that I have
been hearing in this thread of e-mails, that humans don't notice echoes
when listening binaurally in a reverberant room.  Perhaps binaural
neurological processes were responsible for dereverberation.  Could these
processes be modeled? Or are we simply capable of picking up enough
information to understand the speech and ignore the reverb?

        To investigate this possibility I moved away from speech quality
and studied intelligibility in a way similar to how you described using
your Walkman headphones, except I did use good microphones and tried to
eliminate the frequency response of the measurement equipment.  I looked
at only reverberation, no additive noise, no competing speakers, and full
bandwidth (60-22kHz).  Work has already been done in this area, Nabelek
(82), but to gather more knowledge I have done some studies that consider
some differences of monaural and binaural listening to real and simulated
reverberation considering interaural time difference, interaural level
differences, and spectral weighting.  What I've found so far is in
agreement with Nabelek's findings, that the binaural speech
intelligibility advantage without competing noise sources in reverberation
is relatively small, < 5 % difference in intelligibility for normal
hearing listeners.  (I plan on presenting my results at the next ASA
conference, and am in the process of writing up these results for
publication, sorry they aren't done yet.  I also intend to do some quality
testing in the future.)

        At this point, for no competing noise sources I have NOT shown
that binaural listening makes great improvements in intelligibility.
Furthermore my experimental conditions have not shown that the pinna and
interaural level differences have an effect on intelligibility. (see
Bronkhorst (00) for some disagreement with my last statement.)

        Now back to the speaker phones, why do they sound so bad? and why
are they less intelligible?  I suspect that the decrease has a lot to do
with reverberation (direct to reverberation ratios and all that) a little
to do with binaural listening (when only one sound sources exist), and a
lot to do with the bandwidth. The quality will be affected differently,
Eisenberg (98).  If speaker phones were full bandwidth, then in the
absence of competing sound sources I don't see that a binaural phone would
offer great improvements in intelligibility over a single channel phone,
little more than a summation of an appropriately delayed version of each
signal.  (see Koenig (50) for disagreement on this point.)  However the
argument might not hold for reduced bandwidth.

        Is there any research that directly links single echo suppresion
to reverberation suppression?  Nabelek (89) and Bronkhorst (00) both have
tested and suggested that the reverberation acts as a masker, an area ripe
for research. I graduate in a year, anyone have a post-doc position? :)

Brad Libbey
Graduate Student, Georgia Institute of Technology



REFERENCES

Bogert, Bruce P., M. J. R. Healy, and John W. Tukey. (1963). "The
Quefrency Analysis of Time Series for Echoes: Cepstrum,
Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking." Proceedings of
the Symposium on Time Series Analysis. Ed. Murray Rosenblatt. New York, NY:
John Wiley and Sons.

Bronkhorst, Adelbert W.  (2000).  "The cocktail part phenomenon: A review
of research on speech intelligibility in multiple-talker conditions."
Acustica, 86: 117-128.

Eisenberg, Laurie S., D. D. Dirks, S. Takayanagi, and A. S. Martinez.
(1998).  "Subjective judgments of clarity and intelligibility for filtered
stimuli with equivalent speech intelligibility index predictions."  J.
Speech, Language, Hearing Res., 41: 327-339

Eisenberg, Laurie S., Robert V. Shannon, Amy Schaefer Martinez, John
Wygonski, and Arthur Boothroyd.  (2000).  "Speech recognition with reduced
spectral cues as a function of age." J. Acoust. Soc. Am. 107: 2704-10.

Koenig, W.  (1950).  "Subjective effects in binaural hearing."  J.
Acoust. Soc. Am., 22: 61-62.

Nabelek, Anna K. Nabelek, Tomasz R. Letowski, and Frances Tucker. (1989).
"Reverberant overlap- and self-masking in consonant identification."  J.
Acoust. Soc. Am., 86: 1259-1265.

Nabelek, Anna K., and Pauline K. Robinson. (1982). "Monaural and Binaural
Speech Perception in Reverberation for Listeners of Various Ages." J.
Acoust Soc. Am. 71: 1242-1248.