Re: Statistics for word rate in natural speech (Kevin Austin )


Subject: Re: Statistics for word rate in natural speech
From:    Kevin Austin  <kevin.austin@xxxxxxxx>
Date:    Mon, 20 Jun 2016 14:06:45 -0400
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--Apple-Mail=_0897EE89-C895-4CA1-9F38-419CD1FC922B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Thank you. I wonder about the technology they used 75 years ago for the = measurement. I just used two one-minute readings from a well-known = unread book, the openings of Episode One, and Episode Seventeen, one is = narrative, the other catechismic, with the following results: =E2=80=A2 160 words / 213 syllables [~280 ms/syllable] =E2=80=A2 146 words / 265 syllables [~225 ms/syllable] This included pauses [,] and cessations [.]. I then removed most of the =E2=80=98silences=E2=80=99, and breathing, = and brought each text down to around 52 seconds, bringing the first = example down to a breathless average of 244ms, and the second down to an = even more breathless ~195ms. Some specific examples: joining ~ 390ms =3D 145ms/syllable growth ~ 270ms gaslight ~ 600ms =3D 300ms/syllable friendship ~ 400ms =3D 200ms/syllable Bloom and Steven ~ 880ms =3D 220ms/syllable I compressed =E2=80=9CBloom and Steven=E2=80=9D down to 500ms, ie = 1/8-second per syllable, and the 13 phonemes, =E2=80=94 26 phonemes/sec, = are at about 40Hz. Perhaps in 1940 =E2=80=98conversational speech=E2=80=99= was this fast. Kevin Also: = https://www.quora.com/Speeches-For-the-average-person-speaking-at-a-normal= -pace-what-is-the-typical-number-of-words-they-can-say-in-one-minute = <https://www.quora.com/Speeches-For-the-average-person-speaking-at-a-norma= l-pace-what-is-the-typical-number-of-words-they-can-say-in-one-minute> I am a professional speaker and podcast host and I speak at = approximately 145-160 words per minute (wpm), while many sources state = that average American English speaker engaged in a friendly conversation = speaks at a rate of approximately 110=E2=80=93150 wpm.=20 > On 2016, Jun 20, at 7:33 AM, Christine Rankovic = <rankovic@xxxxxxxx> wrote: >=20 > Dunn and White (1940) is a classic report on speech measurements. = They assumed 1/8-second as the length of a syllable for their classic = measurements. >=20 > The reference is: Dunn, H.K. and White, S.D. (1940). Statistical = Measurements on Conversational Speech. Journal of the Acoustical = Society of America 11:278-288. >=20 > Christine Rankovic, PhD > Speech and Hearing Scientist >=20 >=20 > -----Original Message----- > From: AUDITORY - Research in Auditory Perception = [mailto:AUDITORY@xxxxxxxx On Behalf Of Kevin Austin > Sent: Monday, June 20, 2016 12:48 AM > To: AUDITORY@xxxxxxxx > Subject: Re: Statistics for word rate in natural speech >=20 > Thank you. >=20 > I=E2=80=99m not a linguist or psycholinguist, so I write only from = direct experience. >=20 > My reading is that the question is not very 'well-formed', and = therefore the answers do not respond to the question. >=20 > The question was about =E2=80=98words=E2=80=99 [whatever they may = happen to be], and the answers start with the idea of syllable, and = Jont=E2=80=99s answer seems to be in =E2=80=98base phonemic elements=E2=80= =99. For example, the two words, =E2=80=9CI=E2=80=9D, and =E2=80=9Cstopped= =E2=80=9D, count two words, each of one syllable, but =E2=80=98stopped=E2=80= =99 is ccvcc [if the /p/ is pronounced]. >=20 > 10ms [ie 100Hz] seems to be a very small duration, and may only apply = to a very limited number of phonemes. I had learned that the shortest = time that was reliable for the [sequential] discrimination of auditory = events was in the range of 25 to 40 ms =E2=80=94 40 to 25Hz. A ~16Hz = limit works out to be around 60-70ms. >=20 > But sixteen =E2=80=9Cwhat=E2=80=99s=E2=80=9D? Try the test. Record = sixteen one syllable words, with cv or vc forms: be, am, so, it, two, = aught, tea, ear, tie . . etc Most of these are two phonemes, or three if = a diphthong is considered a grouped vowel, as in the word =E2=80=98tie=E2=80= =99. Say them quickly. Edit them into a sequence with no gaps, and = shorten the sequence to be 1,000ms. Is it possible to do sequential = segmentation? leaving aside the articulatory problems. >=20 > Record: =E2=80=9CI spied the top pie=E2=80=9D, and =E2=80=9CNorth-easter= n Carolinian national seashore=E2=80=9D. Both are =E2=80=98five = words=E2=80=99. For interest, edit out the words: =E2=80=98top', = =E2=80=98pie', =E2=80=98Carolinian', and =E2=80=98national=E2=80=99. = Tricks such as producing the /d/ in spied as being the stopped diphthong = /ai/, and the contracting of the /p/, and the /n/, likely increase the = rate of delivery in natural speech, but most likely mostly in informal = contexts. >=20 > =E2=80=9CWhat was the question again?=E2=80=9D cv ccvc cv ccvccvcvcvc >=20 >=20 > Kevin=20 >=20 >=20 >=20 >=20 >> On 2016, Jun 19, at 8:03 AM, Jont Allen <jontalle@xxxxxxxx> = wrote: >>=20 >> All, >>=20 >> A comment that I hope is helpful. >>=20 >> In our speech work we have learned, from extensive analysis, that the = fastest temporal resolution that speech is processed at, by the auditory = system, is about 10 [ms]. >> That means that the natural temporal units for talking about speech = (or singing) is in centiseconds [cs]. For example, the plosive burst of = say /ka/ is about 1-2 [cs]. >> I have not found very many examples of less than 1 [cs], as the = perception deteriorates quickly when you go below (shorter that) 1 [cs]. >>=20 >> Based the numbers below for rapper Big Boi, 379 syllables/m is about=20= >> 16 [cs] >> 1000*60/379 =3D 15.8 >>=20 >> This seems like a nice way to quantify this rate. Its close to the = perceptual lower limit of 1 [sc]. A full syllable (CV, VC) of 16 seems = pretty short. >>=20 >> Jont Allen >>=20 >> On 06/18/2016 11:39 PM, Arun Chandra wrote: >>> In Mozart's "Le Nozze di Figaro", Bartolo sings his revenge aria at = about quarter =3D=3D 112mm, which means the syllables are going by in = triplets at about 336 per minute. >>>=20 >>> in Rossini's "Barber of Seville", the character Bartolo (the same = character, again) sings his accusing aria to Rosina (his ward) at about = quarter =3D=3D 116mm, which means the sixteenth note syllables are going = by at about 464 per minute. >>>=20 >>> the "Modern Major General's Song" by Gilbert and Sullivan goes by at = about 184mm, so it's syllables are about 368 per minute. >>>=20 >>> arun >>>=20 >>>=20 >>>=20 >>> On 6/18/16 4:07 AM, Huron, David wrote: >>>> We have a wide tolerance for speech with "normal" paces ranging = between 170 and 260 syllables per minute. >>>> (Yuan, Liberman & Cieri, 2006; Towards an integrated understanding=20= >>>> of speaking rate in conversation. INTER SPEECH conference Proc.) >>>>=20 >>>> Music exhibits an enormous range of lyrical pace. Judy Garland's = rendition of "Somewhere Over the Rainbow" clocks in at a leisurely 64 = syllables per minute. By contrast, in "Ms. Jackson" by OutKast, rapper = Big Boi reaches an extraordinary 379 syllables per minute. >>>>=20 >>>> -David Huron with Nat Condit-Schultz >>>>=20 >>>> ________________________________________ >>>> From: AUDITORY - Research in Auditory Perception=20 >>>> [AUDITORY@xxxxxxxx on behalf of Bruno L. Giordano=20 >>>> [brungio@xxxxxxxx >>>> Sent: Friday, June 17, 2016 8:32 AM >>>> To: AUDITORY@xxxxxxxx >>>> Subject: Statistics for word rate in natural speech >>>>=20 >>>> Hello, >>>>=20 >>>> I am looking for published statistics on average word rate in = natural speech (words/minute). >>>>=20 >>>> Is there some golden standard reference for this? >>>>=20 >>>> Thank you! >>>>=20 >>>> Bruno >>>>=20 >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Bruno L. Giordano, PhD >>>> Institute of Neuroscience and Psychology >>>> 58 Hillhead Street, University of Glasgow Glasgow, G12 8QB, = Scotland=20 >>>> T +44 (0) 141 330 5484 >>>> Www: http://www.brunolgiordano.net >>>> Email charter: http://www.emailcharter.org/ >>>>=20 >=20 --Apple-Mail=_0897EE89-C895-4CA1-9F38-419CD1FC922B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; = -webkit-line-break: after-white-space;" class=3D""><div class=3D"">Thank = you.</div><div class=3D""><br class=3D""></div><div class=3D"">I wonder = about the technology they used 75 years ago for the measurement. I just = used two one-minute readings from a well-known unread book, the openings = of Episode One, and Episode Seventeen, one is narrative, the other = catechismic, with the following results:</div><div class=3D"">&nbsp; =E2=80= =A2 160 words / 213 syllables &nbsp; [~280 ms/syllable]<br = class=3D"">&nbsp; =E2=80=A2 146 words / 265 syllables &nbsp; [~225 = ms/syllable]</div><div class=3D"">This included pauses [,] and = cessations [.].</div><div class=3D"">I then removed most of the = =E2=80=98silences=E2=80=99, and breathing, and brought each text down to = around 52 seconds, bringing the first example down to a breathless = average of 244ms, and the second down to an even more breathless = ~195ms.</div><div class=3D""><br class=3D""></div><div class=3D"">Some = specific examples:</div><div class=3D"">&nbsp; &nbsp; joining &nbsp;~ = 390ms =3D 145ms/syllable</div><div class=3D"">&nbsp; &nbsp; growth = &nbsp; ~ 270ms</div><div class=3D"">&nbsp; &nbsp; gaslight &nbsp; ~ = 600ms &nbsp;=3D 300ms/syllable</div><div class=3D"">&nbsp; &nbsp; = friendship &nbsp;~ 400ms &nbsp;=3D 200ms/syllable</div><div = class=3D"">&nbsp; &nbsp; Bloom and Steven &nbsp; ~ 880ms &nbsp;=3D = 220ms/syllable<br class=3D""><br class=3D""></div><div class=3D"">I = compressed =E2=80=9CBloom and Steven=E2=80=9D down to 500ms, ie = 1/8-second per syllable, and the 13 phonemes, =E2=80=94 26 phonemes/sec, = are at about 40Hz. Perhaps in 1940 =E2=80=98conversational speech=E2=80=99= was this fast.</div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""></div><div class=3D"">Kevin</div><div = class=3D""><br class=3D""></div><div class=3D"">Also: <a = href=3D"https://www.quora.com/Speeches-For-the-average-person-speaking-at-= a-normal-pace-what-is-the-typical-number-of-words-they-can-say-in-one-minu= te" = class=3D"">https://www.quora.com/Speeches-For-the-average-person-speaking-= at-a-normal-pace-what-is-the-typical-number-of-words-they-can-say-in-one-m= inute</a></div><div class=3D""><br class=3D""></div><div class=3D""><span = style=3D"color: rgb(51, 51, 51);" class=3D""><font face=3D"Georgia" = class=3D""><i class=3D"">I am a professional speaker and podcast host = and I speak at approximately 145-160 words per minute (wpm), while many = sources state that average American English speaker engaged in a = friendly conversation speaks at a rate of approximately 110=E2=80=93150 = wpm.&nbsp;</i></font></span></div><div class=3D""><br = class=3D""></div><div class=3D""><br class=3D""></div><div class=3D""><br = class=3D""></div><br class=3D""><blockquote type=3D"cite" class=3D"">On = 2016, Jun 20, at 7:33 AM, Christine Rankovic &lt;<a = href=3D"mailto:rankovic@xxxxxxxx" = class=3D"">rankovic@xxxxxxxx</a>&gt; wrote:<br class=3D""><br = class=3D"">Dunn and White (1940) is a classic report on speech = measurements. &nbsp;They assumed 1/8-second as the&nbsp;length of a = syllable for their classic measurements.<br class=3D""><br class=3D"">The = reference is: &nbsp;Dunn, H.K. and White, S.D. (1940). Statistical = Measurements on Conversational&nbsp;Speech. &nbsp;Journal of the = Acoustical Society of America 11:278-288.<br class=3D""><br = class=3D"">Christine Rankovic, PhD<br class=3D"">Speech and Hearing = Scientist<br class=3D""><br class=3D""><br class=3D"">-----Original = Message-----<br class=3D"">From: AUDITORY - Research in Auditory = Perception [<a href=3D"mailto:AUDITORY@xxxxxxxx" = class=3D"">mailto:AUDITORY@xxxxxxxx</a>] On Behalf Of = Kevin&nbsp;Austin<br class=3D"">Sent: Monday, June 20, 2016 12:48 AM<br = class=3D"">To: <a href=3D"mailto:AUDITORY@xxxxxxxx" = class=3D"">AUDITORY@xxxxxxxx</a><br class=3D"">Subject: Re: = Statistics for word rate in natural speech<br class=3D""><br = class=3D"">Thank you.<br class=3D""><br class=3D"">I=E2=80=99m not a = linguist or psycholinguist, so I write only from direct experience.<br = class=3D""><br class=3D"">My reading is that the question is not very = 'well-formed', and therefore the answers do not respond to&nbsp;the = question.<br class=3D""><br class=3D"">The question was about = =E2=80=98words=E2=80=99 [whatever they may happen to be], and the = answers start with the idea of&nbsp;syllable, and Jont=E2=80=99s answer = seems to be in =E2=80=98base phonemic elements=E2=80=99. For example, = the two words, =E2=80=9CI=E2=80=9D,&nbsp;and =E2=80=9Cstopped=E2=80=9D, = count two words, each of one syllable, but =E2=80=98stopped=E2=80=99 is = ccvcc [if the /p/ is&nbsp;pronounced].<br class=3D""><br class=3D"">10ms = [ie 100Hz] seems to be a very small duration, and may only apply to a = very limited number of&nbsp;phonemes. I had learned that the shortest = time that was reliable for the [sequential] discrimination = of&nbsp;auditory events was in the range of 25 to 40 ms =E2=80=94 40 to = 25Hz. A ~16Hz limit works out to be around 60-70ms.<br class=3D""><br = class=3D"">But sixteen =E2=80=9Cwhat=E2=80=99s=E2=80=9D? Try the test. = Record sixteen one syllable words, with cv or vc forms: be, am, = so,&nbsp;it, two, aught, tea, ear, tie . . etc Most of these are two = phonemes, or three if a diphthong is&nbsp;considered a grouped vowel, as = in the word =E2=80=98tie=E2=80=99. Say them quickly. Edit them into a = sequence with no&nbsp;gaps, and shorten the sequence to be 1,000ms. Is = it possible to do sequential segmentation? leaving&nbsp;aside the = articulatory problems.<br class=3D""><br class=3D"">Record: =E2=80=9CI = spied the top pie=E2=80=9D, and =E2=80=9CNorth-eastern Carolinian = national seashore=E2=80=9D. Both are =E2=80=98five words=E2=80=99.&nbsp;Fo= r interest, edit out the words: =E2=80=98top', =E2=80=98pie', = =E2=80=98Carolinian', and =E2=80=98national=E2=80=99. Tricks such as = producing&nbsp;the /d/ in spied as being the stopped diphthong /ai/, and = the contracting of the /p/, and the /n/,&nbsp;likely increase the rate = of delivery in natural speech, but most likely mostly in informal = contexts.<br class=3D""><br class=3D"">=E2=80=9CWhat was the question = again?=E2=80=9D cv ccvc cv ccvccvcvcvc<br class=3D""><br class=3D""><br = class=3D"">Kevin&nbsp;<br class=3D""><br class=3D""><br class=3D""><br = class=3D""><br class=3D""><blockquote type=3D"cite" class=3D"">On 2016, = Jun 19, at 8:03 AM, Jont Allen &lt;<a = href=3D"mailto:jontalle@xxxxxxxx" = class=3D"">jontalle@xxxxxxxx</a>&gt; wrote:<br class=3D""><br = class=3D"">All,<br class=3D""><br class=3D"">A comment that I hope is = helpful.<br class=3D""><br class=3D"">In our speech work we have = learned, from extensive analysis, that the fastest temporal resolution = that&nbsp;speech is processed at, by the auditory system, is about 10 = [ms].<br class=3D"">That means that the natural temporal units for = talking about speech (or singing) is in centiseconds&nbsp;[cs]. For = example, the plosive burst of say /ka/ is about 1-2 [cs].<br class=3D"">I = have not found very many examples of less than 1 [cs], as the perception = deteriorates quickly when&nbsp;you go below (shorter that) 1 [cs].<br = class=3D""><br class=3D"">Based the numbers below for rapper Big Boi, = 379 syllables/m is about&nbsp;<br class=3D"">16 [cs]<br = class=3D"">1000*60/379 =3D 15.8<br class=3D""><br class=3D"">This seems = like a nice way to quantify this rate. Its close to the perceptual lower = limit of 1 [sc]. A&nbsp;full syllable (CV, VC) of 16 seems pretty = short.<br class=3D""><br class=3D"">Jont Allen<br class=3D""><br = class=3D"">On 06/18/2016 11:39 PM, Arun Chandra wrote:<br = class=3D""><blockquote type=3D"cite" class=3D"">In Mozart's "Le Nozze di = Figaro", Bartolo sings his revenge aria at about quarter =3D=3D 112mm, = which&nbsp;means the syllables are going by in triplets at about 336 per = minute.<br class=3D""><br class=3D"">in Rossini's "Barber of Seville", = the character Bartolo (the same character, again) sings = his&nbsp;accusing aria to Rosina (his ward) at about quarter =3D=3D = 116mm, which means the sixteenth note&nbsp;syllables are going by at = about 464 per minute.<br class=3D""><br class=3D"">the "Modern Major = General's Song" by Gilbert and Sullivan goes by at about 184mm, so it's = syllables&nbsp;are about 368 per minute.<br class=3D""><br = class=3D"">arun<br class=3D""><br class=3D""><br class=3D""><br = class=3D"">On 6/18/16 4:07 AM, Huron, David wrote:<br = class=3D""><blockquote type=3D"cite" class=3D"">We have a wide tolerance = for speech with "normal" paces ranging between 170 and 260 syllables = per&nbsp;minute.<br class=3D"">(Yuan, Liberman &amp; Cieri, 2006; = Towards an integrated understanding&nbsp;<br class=3D"">of speaking rate = in conversation. INTER SPEECH conference Proc.)<br class=3D""><br = class=3D"">Music exhibits an enormous range of lyrical pace. Judy = Garland's rendition of "Somewhere Over the&nbsp;Rainbow" clocks in at a = leisurely 64 syllables per minute. By contrast, in "Ms. Jackson" by = OutKast,&nbsp;rapper Big Boi reaches an extraordinary 379 syllables per = minute.<br class=3D""><br class=3D"">-David Huron with Nat = Condit-Schultz<br class=3D""><br = class=3D"">________________________________________<br class=3D"">From: = AUDITORY - Research in Auditory Perception&nbsp;<br class=3D"">[<a = href=3D"mailto:AUDITORY@xxxxxxxx" = class=3D"">AUDITORY@xxxxxxxx</a>] on behalf of Bruno L. = Giordano&nbsp;<br class=3D"">[<a href=3D"mailto:brungio@xxxxxxxx" = class=3D"">brungio@xxxxxxxx</a>]<br class=3D"">Sent: Friday, June 17, = 2016 8:32 AM<br class=3D"">To: <a href=3D"mailto:AUDITORY@xxxxxxxx"= class=3D"">AUDITORY@xxxxxxxx</a><br class=3D"">Subject: = Statistics for word rate in natural speech<br class=3D""><br = class=3D"">Hello,<br class=3D""><br class=3D"">I am looking for = published statistics on average word rate in natural speech = (words/minute).<br class=3D""><br class=3D"">Is there some golden = standard reference for this?<br class=3D""><br class=3D"">Thank you!<br = class=3D""><br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp;Bruno<br = class=3D""><br class=3D"">~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br = class=3D"">Bruno L. Giordano, PhD<br class=3D"">Institute of = Neuroscience and Psychology<br class=3D"">58 Hillhead Street, University = of Glasgow Glasgow, G12 8QB, Scotland&nbsp;<br class=3D"">T +44 (0) 141 = 330 5484<br class=3D"">Www: <a href=3D"http://www.brunolgiordano.net" = class=3D"">http://www.brunolgiordano.net</a><br class=3D"">Email = charter: <a href=3D"http://www.emailcharter.org/" = class=3D"">http://www.emailcharter.org/</a><br class=3D""><br = class=3D""></blockquote></blockquote></blockquote><br = class=3D""></blockquote><br class=3D""></body></html>= --Apple-Mail=_0897EE89-C895-4CA1-9F38-419CD1FC922B--


This message came from the mail archive
/var/www/html/postings/2016/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University