[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

masked speech responses

Dear Auditory List members

Many of you wrote to me regarding my question about masking a
speaker's own speech. I will try to summarize some of the main points
of the responses, but first I should explain more about what we were
trying to do, as that might clarify certain issues; and anyway many of
you were wondering why in the world we would want to do such a strange
thing in the first place.

My colleagues and I are interested in the brain mechanisms underlying
speech, among other things. The technique we're using is positron
emission tomography (PET), which measures changes in local cerebral
blood flow (CBF) in normal volunteers while they perform a given task.
In this particular experiment, we wondered whether we could find evi-
dence for interactions between speech output mechanisms and cortical
regions devoted to auditory analysis.
        The hypothesis behind this is too complex to explain in detail
here, but based on certain other observations we had reason to believe
that there might be some physiological feedback mechanisms, related to
corollary discharge, for example. Anyway, our aim was to scan subjects
while they produced speech at different rates, and to look for any
changes in CBF in auditory cortex that might correlate with the rate
of output. But of course if they're speaking more quickly they'll get
more auditory input per unit time, so a change in CBF in the auditory
cortex would be a trivial observation. So, we reasoned, if we mask the
subject's speech via noise, such that the actual sound reaching the
cochlea is constant, any variation in CBF would have to be a con-
sequence of internally driven feedback mechanisms. Hence our desire to
mask speech.
        To make a long story short, we think the experiment worked. In
order to be able to mask speech effectively without introducing a huge
masking signal, subjects were trained to speak in a whisper, with no
phonation (which they seemed to learn easily). We then adjusted the
noise until they told us they could no longer hear themselves. People
were surprisingly consistent in setting the noise to about 60 dB SPL,
as measured directly from the foam insert earphone using a specially-
adapted acoustic coupler.
        We have not yet finished analyzing the data but one result
which we are pleased with is that there are indeed region in the left
temporal cortex whose blood flow covaries with rate of speech output.
This region probably contains neurons that are specialized for analy-
sis of acoustic features relevant to speech, so the fact that its
hemodynamics are systematically related to rate of speech output could
be evidence for a feedback network of the sort we were hypothesizing.

Now for the answers to my original query:

Several people pointed out the fairly obvious fact that feedback from
one's own voice would consist of both air and bone conduction, so any
masking would have to affect both components. On a related point, by
using insert earphones we make bone conduction all that much more ef-
ficient, particularly in the low frequencies, according to several
responses. Using whispered speech seems to overcome this problem,
though, since there is very little low-frequency energy.

Ed Burns reminded me that the tendency to speak loudly when your voice
is masked is called the Lombard effect. This we took care of via our
training procedure, we think. We did not notice any increase in the
intensity of the subject's voice over the course of the study, once
they had been trained.

A number of people suggested that speech output would be disrupted in
other ways during masking, including speech errors, and problems in
intonation (at least if they were singing; Ward and Burns did some
work on this 20 years ago). This is not much of an issue in my partic-
ular study, since the output was simply two meaningless speech syll-
ables ("ba" and "lu"), so there was not much room for speech errors to
show up. Plus subjects had to practice doing this a lot before we
stuck them inside the scanner. Under more naturalistic conditions,
though, it is likely that speech errors would be observed.

One of the most useful comments came from B. Repp: "You could do a
small control experiment in which you present the subject with DELAYED
auditory feedback of his/her voice, using the same level of masking
noise. If there is no evidence of interference, then you have objec-
tive evidence that the speech was inaudible to the speaker" Sounded
like a clever idea to me.

Also useful was the suggestion by A. Houtsma that noise with a spec-
tral shape similar to speech would be more effective than just white

That's about it for the comments, except that many people seemed to be
interested in the same problem for a variety of different reasons. So
I hope this has been useful to at least some of you.

I appreciated the responses and thank all those who took the time. If
anybody has any further ideas, feel free to communicate them with me.

Best wishes,
Robert Zatorre
Montreal Neurological Institute