[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
FW: modifying speech
Dear List,
Due to requests, I am posting a summary of the responses I received.
I am not attaching the sound samples P. Belin sent, but if anyone
needs them, email me.
Thanks to everyone who responded, incl. Jont Allen (No, I wasn't
aware of the abstract but will go read it).
--fatima
> 1]
> Date: Wed, 28 Feb 2001 08:07:39 -0800
> From: Aniruddh Patel <apatel@nsi.edu>
> To: Fatima Husain <fhusain@cns.bu.edu>
> Subject: Re: Modifying speech
>
> Hi Fatima,
>
> You could try spectrally rotated speech. See:
>
> Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J.S. Identification of a
> pathway for intelligible speech in the left temporal lobe. Brain 123,
> 2400-2406 (2000).
>
> Regards,
> Ani
>
> 2]
> Date: Wed, 28 Feb 2001 16:35:05 -0000
> From: Franck Ramus <f.ramus@ucl.ac.uk>
> To: Fatima Husain <fhusain@cns.bu.edu>
> Subject: Re: Modifying speech
>
>
> dear Fatima,
> perhaps the resynthesis method I have used will be what you need.
> listen to the stimuli on this page:
> http://www.ehess.fr/centres/lscp/persons/ramus/resynth/ecoute.htm
> here, my point was to delexicalise the sentences and make them
> unintelligible,
> but of course you can do what you want (scramble some words, not others,
> etc...). the drawback of this method is that it requires precise phonetic
> labelling of the sentences, but this can be done more or less
> automatically
> with mbrola utilities (ask me if you want more details about this).
> the accompanying paper is:
> Ramus, F., & Mehler, J. (1999). Language identification with
> suprasegmental
> cues: A study based on speech resynthesis. Journal of the Acoustical
> Society of
> America, 105(1), 512-521.
> and you can download it from my publications page:
> http://www.ehess.fr/centres/lscp/persons/ramus/pub.htm
>
> all the best,
>
> Franck Ramus
> Institute of Cognitive Neuroscience
> 17 Queen Square
> London WC1N 3AR
> GB
> tel: (+44) 20 7679 1138
> fax: (+44) 20 7813 2835
> f.ramus@ucl.ac.uk
>
> 3]
> Date: Wed, 28 Feb 2001 11:44:08 -0500
> From: Marc Joanisse <marcj@uwo.ca>
> To: Fatima Husain <fhusain@cns.bu.edu>, AUDITORY@lists.mcgill.ca
> Subject: Re: Modifying speech
>
> Fatima,
>
> Sophie Scott and colleagues have used spectrally rotated speech and
> vocoding
> for this purpose: Scott, S.K., Blank, C.C., Rosen, S. & Wise, R.S.J.
> (2000)
> Brain 123, 2400-2406. The rotation technique is described in a paper by B.
> Blesser (1972) J. Speech and Hearing Research 15, 5-41. It involves
> amplitude modulating a speech waveform which results in high and low
> frequencies being swapped. After lowpass filtering, the resulting speech
> is
> not recognizeable as speech and incomprehensible. The vocoding technique
> is
> described by Shannon et al (2995) Nature 270, 303-4.
>
> One caveat is that rotated speech might not be ideal for tasks involving
> isolated words. When I tried using reversed speech as a control for a
> speech
> discrimination task, subjects reported using speech-like cues while doing
> the
> task. I'd think it would work much better when applied to longer passages,
> which is what Scott et al. used it for.
>
> A second thing to try is modified sinewave speech. Removing the first or
> second sinewave 'formant' from a 3-sinewave speech stimulus pretty much
> removes its intelligibility while simulating some - but not all - of the
> spectral and temporal characteristics of the original. I have been using
> this
> in my own imaging studies with some success. It's not perfect since
> sinewave
> stimuli lack the full spectral characteristics of actual speech. My
> instinct
> is that the importance of this depends on what areas of cortex you are
> interested in imaging. The idea comes from a paper by Mody et al (1997)
> J.
> Exp. Child Psych. 64, 199-231 where they compared speech and nonspeech
> discrimination in children with dyslexia.
>
> Good luck,
>
> -Marc-
>
> 4] Date: Wed, 28 Feb 2001 12:36:59 -0500
> From: Pascal BELIN <pascal@BIC.MNI.MCGILL.CA>
> To: AUDITORY@lists.mcgill.ca
> Subject: Re: Modifying speech
> ----------------------------------------
>
> Dear Fatima and List,
>
> I can think of two control sounds that might 'be generated from normal
> speech, retain more or less of the spectral features of the normal
> speech, yet is not pseudo-word like'.
>
> One is the amplitude-modulated noise that has been used for a while now
> in Robert Zatorre's lab (and others): you simply modulate white-noise by
> the amplitude of the speech signal. You come up with something that has
> very similar amplitude waveform as the original signal, but not the
> spectral content. This is a very 'low level' control, and it might not do
> the job of keeping some of the spectral features.
>
> So another possibility, which we used recently in a neuroimaging study of
> voice perception, is to use 'scrambled speech'. Here, the signal is
> transformed in Fourier space, then for each window of the FFT phase and
> amplitude components are randomized (phase with phase and amplitude with
> amplitude), and an inverse FFT is performed. You end up with a sinal which
> has the same energy as the original one, and a very similar waveform
> (depending on the size of the FFT window, a very important parameter). It
> is very similar to the scrambling used in the object recognition
> litterature, and in fact the spectrogram of these scrambled stimuli looks
> like an visual scramble of the original spectrogram. Yet the spectral
> structure is also dramatically modified, perhaps less than for the
> AM-noise though.
> Attached are a sample of speech and of its scrambled version.
>
> Hope this helps.
>
>
> Pascal BELIN, PhD
> Neuropsychology/Cognitive Neuroscience Unit
> Montreal Neurological Institute
> McGill University, 3801 University Street
> Montreal, Quebec, Canada H3A2B4
> phone: (514) 398-8519 (8504)
> fax: (514) 398-1338
> http://www.zlab.mcgill.ca/
>
> <<scrambled.wav>> <<original.wav>>
>
> 5]
> Date: Wed, 28 Feb 2001 15:44:44 -0500
> From: Jont Allen <jba@research.att.com>
> To: Fatima Husain <fhusain@cns.bu.edu>,
> AUDITORY mailing list <AUDITORY@lists.mcgill.ca>
> Subject: Re: Modifying speech
>
> Fatima Husain wrote:
>
> > Dear List,
> >
> > Sorry to barge into an interesting discussion, but -
> > My lab wants to image subjects listening to normal and modified speech.
> > We are trying to investigate semantic memory.
>
> If you accept that the phoneme is the lowest order of semantics, then
> that would suggest one might test using nonsense CV and CVC sounds.
> They can be identified as parts of words (subjects can think of words that
> start with a given CV for example).
>
> Finally, have you seen the very interesting work of Cyma Van Petten?
> Look at her abstract, JASA page 2643, Vol 108, #5, pt. 2, Nov 2000
> Abstract 5aSC3 (Newport Beach CA meeting Friday Dec 8, 2000)
>
>
> --
> Jont B. Allen
> AT&T Labs-Research, Shannon Laboratory, E161
> 180 Park Ave., Florham Park NJ, 07932-0971
> 973/360-8545voice, x7111fax, http://www.research.att.com/~jba
>
>
>