PSOLA (Kevin Austin )


Subject: PSOLA
From:    Kevin Austin  <kevin.austin@xxxxxxxx>
Date:    Thu, 23 Jun 2016 14:25:20 -0400
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--Apple-Mail=_BF5262C5-E1FA-47D9-8085-3BE006AC16B3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 My experiences with time-compression / expansion, especially with voice, = have been =E2=80=98unsatisfactory=E2=80=99. When I recently compressed 1 = minute of speech down to 52 seconds, I used a manual method of going = through the audio file reducing the duration of silences. I started with = pauses between sentences =E2=80=94 longest, then phrases, then between = words. Most of the semantic information remained in tact, but the = speaker sounded a bit breathless. My next step, considerably longer, was to work on closing up syllables, = and shortening plosives and many unvoiced consonants. A long process, = but important aspects of vowels, and especially vowel groups were left = mostly in place. The next step was to shorten vowels and vowel clusters = by removing =E2=80=98duplicate=E2=80=99 wave forms. In listening to the original and shortened versions back-to-back, I felt = a loss of =E2=80=98intimacy=E2=80=99 in the meaning. It was more like = being spoken at, than being spoken to. For me, PSOLA processing turned = the speech into a computer voice very quickly. This is one of the = problems I have with pitch-correction software used throughout the pop = recording industry. And Tom Lehrer with The Elements: = https://www.youtube.com/watch?v=3DAcS3NOQnsQM = <https://www.youtube.com/watch?v=3DAcS3NOQnsQM> Kevin > On 2016, Jun 23, at 2:43 AM, Versfeld, Niek <n.versfeld@xxxxxxxx> = wrote: >=20 > With respect to your second scitation: > We measured the threshold (50% correct) of intelligibility for = time-compressed sentences. It appeared to be about 12.5 syll/s, i.e. = 80ms per syllable, or 750 syll/minute. Note that the time compression = was artificially imposed by means of PSOLA. No way speakers could utter = these sentences at such a fast tempo. >=20 > Niek >=20 > The relationship between the intelligibility of time-compressed speech = and speech in noise in young and elderly listeners. > Versfeld NJ, Dreschler WA.J Acoust Soc Am. 2002 Jan;111(1 Pt 1):401-8. >=20 --Apple-Mail=_BF5262C5-E1FA-47D9-8085-3BE006AC16B3 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; = -webkit-line-break: after-white-space;" class=3D""><div class=3D"">My = experiences with time-compression / expansion, especially with voice, = have been =E2=80=98unsatisfactory=E2=80=99. When I recently compressed 1 = minute of speech down to 52 seconds, I used a manual method of going = through the audio file reducing the duration of silences. I started with = pauses between sentences =E2=80=94 longest, then phrases, then between = words. Most of the semantic information remained in tact, but the = speaker sounded a bit breathless.</div><div class=3D""><br = class=3D""></div><div class=3D"">My next step, considerably longer, was = to work on closing up syllables, and shortening plosives and many = unvoiced consonants. A long process, but important aspects of vowels, = and especially vowel groups were left mostly in place. The next step was = to shorten vowels and vowel clusters by removing =E2=80=98duplicate=E2=80=99= wave forms.</div><div class=3D""><br class=3D""></div><div class=3D"">In = listening to the original and shortened versions back-to-back, I felt a = loss of =E2=80=98intimacy=E2=80=99 in the meaning. It was more like = being spoken at, than being spoken to. For me, PSOLA processing turned = the speech into a computer voice very quickly. This is one of the = problems I have with pitch-correction software used throughout the pop = recording industry.</div><div class=3D""><br class=3D""></div><div = class=3D"">And Tom Lehrer with The Elements:&nbsp;<a = href=3D"https://www.youtube.com/watch?v=3DAcS3NOQnsQM" = class=3D"">https://www.youtube.com/watch?v=3DAcS3NOQnsQM</a></div><div = class=3D""><br class=3D""></div><div class=3D"">Kevin</div><div = class=3D""><br class=3D""></div><br class=3D""><blockquote type=3D"cite" = class=3D"">On 2016, Jun 23, at 2:43 AM, Versfeld, Niek &lt;<a = href=3D"mailto:n.versfeld@xxxxxxxx" class=3D"">n.versfeld@xxxxxxxx</a>&gt; = wrote:<br class=3D""><br class=3D"">With respect to your second = scitation:<br class=3D"">We measured the threshold (50% correct) of = intelligibility for time-compressed sentences. It&nbsp;appeared to be = about 12.5 syll/s, i.e. 80ms per syllable, or 750 syll/minute. Note that = the time&nbsp;compression was artificially imposed by means of PSOLA. No = way speakers could utter these&nbsp;sentences at such a fast tempo.<br = class=3D""><br class=3D"">Niek<br class=3D""><br class=3D"">The = relationship between the intelligibility of time-compressed speech and = speech in noise in&nbsp;young and elderly listeners.<br = class=3D"">Versfeld NJ, Dreschler WA.J Acoust Soc Am. 2002 Jan;111(1 Pt = 1):401-8.<br class=3D""><br class=3D""></blockquote><br = class=3D""></body></html>= --Apple-Mail=_BF5262C5-E1FA-47D9-8085-3BE006AC16B3--


This message came from the mail archive
/var/www/html/postings/2016/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University