PSOLA

To: AUDITORY@xxxxxxxxxxxxxxx

Subject: PSOLA

From: Kevin Austin <kevin.austin@xxxxxxxxxxxx>

Date: Thu, 23 Jun 2016 14:25:20 -0400

Approved-by: kevin.austin@xxxxxxxxxxxx

Comments: To: "Versfeld, Niek" <n.versfeld@xxxxxxx>

In-reply-to: <FyfnbTivcSJWcFyfobWyKq@videotron.ca>

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <31063_1466223121_5764CA11_31063_5_2_30680c0a-cebe-abfa-d3f0-4aca3ef9e508@gmail.com> <24981_1466308929_57661941_24981_25_1_9942EF80C8B95F4A83E611A793CD9A6A55025E0B@CIO-TNC-D2MBX01.osuad.osu.edu> <24981_1466311821_5766248D_24981_473_1_0d902d86-3e1d-8852-c290-065550b0f31a@evergreen.edu> <EqRgbRnQBb0CVEqRhbeet3@videotron.ca> <16024_1466399812_57677C43_16024_937_1_F1D3644E-DEE3-4F75-BC3D-FCDAC0834586@videotron.ca> <ExSFbrd4OkBnLExSGbPUsp@videotron.ca> <24372_1466482587_5768BF9B_24372_327_30_5409C83C-ED97-4387-A1F9-B23AA6802FF1@videotron.ca> <18919_1466586748_576A567C_18919_278_1_e25d4913-98e6-457b-7681-e9e9eeec53ba@gmail.com> <16303_1466655530_576B632A_16303_64_2_38521317-1AFC-4061-903C-D0199F6E3C69@temple.edu> <FyfnbTivcSJWcFyfobWyKq@videotron.ca>

Reply-to: Kevin Austin <kevin.austin@xxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

My experiences with time-compression / expansion, especially with voice, have been ‘unsatisfactory’. When I recently compressed 1 minute of speech down to 52 seconds, I used a manual method of going through the audio file reducing the duration of silences. I started with pauses between sentences — longest, then phrases, then between words. Most of the semantic information remained in tact, but the speaker sounded a bit breathless.

My next step, considerably longer, was to work on closing up syllables, and shortening plosives and many unvoiced consonants. A long process, but important aspects of vowels, and especially vowel groups were left mostly in place. The next step was to shorten vowels and vowel clusters by removing ‘duplicate’ wave forms.

In listening to the original and shortened versions back-to-back, I felt a loss of ‘intimacy’ in the meaning. It was more like being spoken at, than being spoken to. For me, PSOLA processing turned the speech into a computer voice very quickly. This is one of the problems I have with pitch-correction software used throughout the pop recording industry.

And Tom Lehrer with The Elements: https://www.youtube.com/watch?v=AcS3NOQnsQM

Kevin

On 2016, Jun 23, at 2:43 AM, Versfeld, Niek <n.versfeld@xxxxxxx> wrote:

With respect to your second scitation:
We measured the threshold (50% correct) of intelligibility for time-compressed sentences. It appeared to be about 12.5 syll/s, i.e. 80ms per syllable, or 750 syll/minute. Note that the time compression was artificially imposed by means of PSOLA. No way speakers could utter these sentences at such a fast tempo.

Niek

The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners.
Versfeld NJ, Dreschler WA.J Acoust Soc Am. 2002 Jan;111(1 Pt 1):401-8.