[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Modifying speech



Dear Fatima and List,

I can think of two control sounds that might 'be generated from normal
speech, retain more or less of the spectral features of the normal
speech, yet is not pseudo-word like'.

One is the amplitude-modulated noise that has been used for a while now
in Robert Zatorre's lab (and others): you simply modulate white-noise by
the amplitude of the speech signal. You come up with  something that has
very similar amplitude waveform as the original signal,  but not the
spectral content. This is a very 'low level' control, and it  might not do
the job of keeping some of the spectral features.

So another possibility, which we used recently in a neuroimaging study of
voice perception, is to use 'scrambled speech'. Here, the signal is
transformed in Fourier space, then for each window of the FFT phase and
amplitude components are randomized (phase with phase and amplitude with
amplitude), and an inverse FFT is performed. You end up with a sinal which
has the same energy as the original one, and a very similar waveform
(depending on the size of the FFT window, a very important parameter). It
is very similar to the  scrambling used in the object recognition
litterature, and in fact the  spectrogram of these scrambled stimuli looks
like an visual scramble of  the original spectrogram. Yet the spectral
structure is also dramatically modified, perhaps less than for the
AM-noise though.

Attached are a sample of speech and of its scrambled version.

Hope this helps.


Pascal BELIN, PhD
Neuropsychology/Cognitive Neuroscience Unit
Montreal Neurological Institute
McGill University, 3801 University Street
Montreal, Quebec, Canada H3A2B4
phone:  (514) 398-8519 (8504)
fax:  (514) 398-1338
http://www.zlab.mcgill.ca/


On Wed, 28 Feb 2001, Fatima Husain wrote:

> Dear List,
>
> Sorry to barge into an interesting discussion, but -
> My lab wants to image subjects listening to normal and modified speech.
> We are trying to investigate semantic memory.
> The modified speech is generated from normal speech, retains more or less
> of the spectral features of the normal speech, yet is not pseudo-word
> like. The reason some of us don't like pseudo words is that, many subjects
> treat words and pseudo-words in the same manner, activating a mental
> lexicon and semantic memory (what we are interested in).
>
> We looked at reversed speech and low-pass filtered speech. But these were
> either too foreign-language-like (according to naive listeners) or too
> much like humming.
>
> I would be grateful if someone would suggest suitable methods of
> modifying speech to fulfill our (somewhat vague) requirements.
>
> Thanks,
> --fatima
>
> fhusain@cns.bu.edu
> fthusain@helix.nih.gov
>

Attachment: scrambled.wav
Description: Binary data

Attachment: original.wav
Description: Binary data