[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSOLA



That’s interesting – I thought I’d detected something similar in adverts that have had some temporal ‘trimming’ – I find them intensely irritating, as though I’m being ranted at, and the timing and semantic content seem at odds

 

Dr. Peter Lennox

Senior Lecturer in Perception

College of Arts

University of Derby, UK

e: p.lennox@xxxxxxxxxxx

t: 01332 593155

https://derby.academia.edu/peterlennox

https://www.researchgate.net/profile/Peter_Lennox

 

From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Kevin Austin
Sent: 23 June 2016 19:25
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: PSOLA

 

My experiences with time-compression / expansion, especially with voice, have been ‘unsatisfactory’. When I recently compressed 1 minute of speech down to 52 seconds, I used a manual method of going through the audio file reducing the duration of silences. I started with pauses between sentences — longest, then phrases, then between words. Most of the semantic information remained in tact, but the speaker sounded a bit breathless.

 

My next step, considerably longer, was to work on closing up syllables, and shortening plosives and many unvoiced consonants. A long process, but important aspects of vowels, and especially vowel groups were left mostly in place. The next step was to shorten vowels and vowel clusters by removing ‘duplicate’ wave forms.

 

In listening to the original and shortened versions back-to-back, I felt a loss of ‘intimacy’ in the meaning. It was more like being spoken at, than being spoken to. For me, PSOLA processing turned the speech into a computer voice very quickly. This is one of the problems I have with pitch-correction software used throughout the pop recording industry.

 

And Tom Lehrer with The Elements: https://www.youtube.com/watch?v=AcS3NOQnsQM

 

Kevin

 



On 2016, Jun 23, at 2:43 AM, Versfeld, Niek <n.versfeld@xxxxxxx> wrote:

With respect to your second scitation:
We measured the threshold (50% correct) of intelligibility for time-compressed sentences. It appeared to be about 12.5 syll/s, i.e. 80ms per syllable, or 750 syll/minute. Note that the time compression was artificially imposed by means of PSOLA. No way speakers could utter these sentences at such a fast tempo.

Niek

The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners.
Versfeld NJ, Dreschler WA.J Acoust Soc Am. 2002 Jan;111(1 Pt 1):401-8.

 



The University of Derby has a published policy regarding email and reserves the right to monitor email traffic.
If you believe this was sent to you in error, please reply to the sender and let them know.

Key University contacts: http://www.derby.ac.uk/its/contacts/