Re: Formants (Kevin Austin )


Subject: Re: Formants
From:    Kevin Austin  <kevin.austin@xxxxxxxx>
Date:    Tue, 6 Sep 2011 21:33:25 -0400
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Hi I've read your three postings a number of times, and Jim's reply, and am not quite able to describe the problem / situation your are asking about. You have used the terms transposition, morphing, change the pitch, pitch shift, "change the components" . . . most of which to me have somewhat different meanings, and different outcomes. Your question related to how your pilot subjects were not able to differentiate between the various speakers. Depending on the software, "transposition" may, or may not change the duration of the original. "Classic" transposition is the equivalent of speeding up a tape recorder. The entire signal is simply played faster. This changes many things, the duration, fundamental, vowel formants, spectral components of unvoiced consonants, rate of change in diphthongs, sense of size of mouth and tongue, etc, all of which (and others), I take as being some of the parameters that make a voice "recognizable" -- that is, provides it with a unique identity. Two different issues therefore. What kinds of changes would your pilot subjects accept as being "the same" speaker. Do I read you correctly that to the pilot subjects, all of the transformed versions sounded the same? [since they were unable to differentiate between them]. If you are using some form of 'pitch-shifting', then the partials should remain in the same frequency ratios under 'transposition'. In frequency shifting, the partials should remain at the same frequency difference, producing inharmonic spectra. One way to see how this works without a "morphing" algorithm would be to do a apertural analysis of a speaker repeating the same phrase at five different pitch levels. It is my understanding that when singers sing notes that obliterate the lower formants, the upper formants are slightly shifted [upwards] to accommodate this change. Vowel formants are 'normalized' at certain frequency ranges, but any individual will produce slightly (or wildly) different formant structures. Consider the word "coffee", as spoken by someone from Bahston, and someone from South Carlinah. The "o" [in IPA], would be two different vowels. It could be that the bandpass filters in the important first and second formant regions are simply too far apart, and the vowels get 'normalized'. [See also http://en.wikipedia.org/wiki/Formant] If the first two formants of the vowel "o" [not IPA] are 500 Hz and 1,000 Hz in one speaker, and 480 and 1,100 Hz in another speaker, then if processed by the same center frequency filter of [say] 1,050 Hz, then this second formant would be the same for both speakers, at 1,050 Hz. This could result in the two original speakers sounding very much the same under transposition. This is a well-known issue in vocoders that have a limited number of bands -- they produce "robotic voices". Somewhat repaired [and commercialized by Autotune]. http://www.youtube.com/watch?v=bduQaCRkgg4 http://www.youtube.com/watch?v=tBb4cjjj1gI&NR=1 Someone who is trained and highly skilled in voice identification may however pick up on other elements of the voice production that are identity markers. A quick review of vocal impersonations [on YouTube http://www.youtube.com/results?search_query=impersonations&aq=f] will reveal everything from cartoon-like impressions to some quite fine acts of mimicry. A listener who applies concepts of articulatory phonetics to their listening to voices, will also be matching tongue, lip etc placement, rates of changes in speech, spectral glides and resonances etc, not only the 'acoustical' signal. Such a person will hear the parameters that the impersonator is able to do well, and those that slip through the cracks. IMV Kevin On 2011, Sep 6, at 2:13 PM, Brittany Guidone wrote: > Dear List, > > Thanks to previous responders I have narrowed my questions down to this: > > We are pitch shifting voices with a particular algorithm. We are curious to know that if from a qualitative standpoint, can we assume that the size/energy/ of the formants are being preserved and only where they lay is being altered? And is there any program that would allow us to check this? > > Thank you > -Brittany > Sent from my Verizon Wireless BlackBerry >> Dear List, >> >> To explain more about my original question: >> >> I mainly want to know if "morphing" or "changing" the pitch of a male voice by positive 4 semi tones will change the components of the original male voice (before it was morphed) in a way that will make the two voices have different components or "make up" in comparison to one another, besides the fact that they will have different pitches. >> >> In other words, when the pitch of a voice is changed in audacity or garage band then what other components of the voice are changed (besides the pitch shift)? >> >> -Brittany >> Sent from my Verizon Wireless BlackBerry >>> Dear List, >>> >>> I am designing a study in which I planned to "morph" female voices into male voices >>> and vice versa. The method I tried was simply to change the pitch (using GarageBand, >>> although I also tried Audacity). At first I tried small transpositions, just 3-4 semitones up >>> or down. >>> >>> Our task is to identify individual speakers. The problem is that the speakers were easily >>> discernible before the transformation, however, after the transposition, none of our pilot >>> subjects were able to differentiate between the various speakers. >>> >>> Can anyone help me understand what is going on? What is known about the acoustic information >>> people use identifying individual speakers? Can anyone suggest a better transformation? >>> >>> Thanks in advance, >>> Brittany


This message came from the mail archive
/var/www/postings/2011/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University