Re: Statistics for word rate in natural speech (Dan McCloy )


Subject: Re: Statistics for word rate in natural speech
From:    Dan McCloy  <drmccloy@xxxxxxxx>
Date:    Mon, 20 Jun 2016 10:08:58 -0700
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--001a114e6e7ca22d9a0535b8c215 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Kevin, Here are a couple references that address issues of word duration from a more linguistic perspective. If you dig further I'm afraid you will still find that most such studies deal in syllables (not words), in part because the notion of what counts as a single word is not uncontroversial (perhaps surprisingly to non-linguists). @xxxxxxxx{baker_variability_2009, title =3D {Variability in word duration as a function of probability, speech style, and prosody}, volume =3D {52}, url =3D {http://las.sagepub.com/cgi/doi/10.1177/0023830909336575}, doi =3D {10.1177/0023830909336575}, number =3D {4}, journal =3D {Language and Speech}, author =3D {Baker, Rachel E. and Bradlow, Ann R.}, year =3D {2009}, pmid =3D {20121039}, pmcid =3D {PMC2841971}, pages =3D {391-413}, } @xxxxxxxx{pellegrino_across-language_2011, title =3D {Across-language perspective on speech information rate}, volume =3D {87}, issn =3D {1535-0665}, url =3D { http://muse.jhu.edu/content/crossref/journals/language/v087/87.3.pellegrino= .html }, doi =3D {10.1353/lan.2011.0057}, number =3D {3}, journal =3D {Language}, author =3D {Pellegrino, Fran=C3=A7ois and Coup=C3=A9, Christophe and Ma= rsico, Egidio}, year =3D {2011}, pages =3D {539-558}, } -- dan Daniel McCloy http://dan.mccloy.info/ Postdoctoral Research Associate Institute for Learning and Brain Sciences University of Washington On Sun, Jun 19, 2016 at 9:48 PM, Kevin Austin <kevin.austin@xxxxxxxx> wrote: > Thank you. > > I=E2=80=99m not a linguist or psycholinguist, so I write only from direct > experience. > > My reading is that the question is not very 'well-formed', and therefore > the answers do not respond to the question. > > The question was about =E2=80=98words=E2=80=99 [whatever they may happen = to be], and the > answers start with the idea of syllable, and Jont=E2=80=99s answer seems = to be in > =E2=80=98base phonemic elements=E2=80=99. For example, the two words, =E2= =80=9CI=E2=80=9D, and =E2=80=9Cstopped=E2=80=9D, > count two words, each of one syllable, but =E2=80=98stopped=E2=80=99 is c= cvcc [if the /p/ > is pronounced]. > > 10ms [ie 100Hz] seems to be a very small duration, and may only apply to = a > very limited number of phonemes. I had learned that the shortest time tha= t > was reliable for the [sequential] discrimination of auditory events was i= n > the range of 25 to 40 ms =E2=80=94 40 to 25Hz. A ~16Hz limit works out to= be around > 60-70ms. > > But sixteen =E2=80=9Cwhat=E2=80=99s=E2=80=9D? Try the test. Record sixtee= n one syllable words, > with cv or vc forms: be, am, so, it, two, aught, tea, ear, tie . . etc Mo= st > of these are two phonemes, or three if a diphthong is considered a groupe= d > vowel, as in the word =E2=80=98tie=E2=80=99. Say them quickly. Edit them = into a sequence > with no gaps, and shorten the sequence to be 1,000ms. Is it possible to d= o > sequential segmentation? leaving aside the articulatory problems. > > Record: =E2=80=9CI spied the top pie=E2=80=9D, and =E2=80=9CNorth-eastern= Carolinian national > seashore=E2=80=9D. Both are =E2=80=98five words=E2=80=99. For interest, e= dit out the words: =E2=80=98top', > =E2=80=98pie', =E2=80=98Carolinian', and =E2=80=98national=E2=80=99. Tric= ks such as producing the /d/ in > spied as being the stopped diphthong /ai/, and the contracting of the /p/= , > and the /n/, likely increase the rate of delivery in natural speech, but > most likely mostly in informal contexts. > > =E2=80=9CWhat was the question again?=E2=80=9D cv ccvc cv ccvccvcvcvc > > > Kevin > > > > > > On 2016, Jun 19, at 8:03 AM, Jont Allen <jontalle@xxxxxxxx> wrote: > > > > All, > > > > A comment that I hope is helpful. > > > > In our speech work we have learned, from extensive analysis, that the > fastest temporal resolution that speech is processed at, by the auditory > system, is about 10 [ms]. > > That means that the natural temporal units for talking about speech (or > singing) is in centiseconds [cs]. For example, the plosive burst of say > /ka/ is about 1-2 [cs]. > > I have not found very many examples of less than 1 [cs], as the > perception deteriorates quickly when you go below (shorter that) 1 [cs]. > > > > Based the numbers below for rapper Big Boi, 379 syllables/m is about 16 > [cs] > > 1000*60/379 =3D 15.8 > > > > This seems like a nice way to quantify this rate. Its close to the > perceptual lower limit of 1 [sc]. A full syllable (CV, VC) of 16 seems > pretty short. > > > > Jont Allen > > > > On 06/18/2016 11:39 PM, Arun Chandra wrote: > >> In Mozart's "Le Nozze di Figaro", Bartolo sings his revenge aria at > about quarter =3D=3D 112mm, which means the syllables are going by in tri= plets > at about 336 per minute. > >> > >> in Rossini's "Barber of Seville", the character Bartolo (the same > character, again) sings his accusing aria to Rosina (his ward) at about > quarter =3D=3D 116mm, which means the sixteenth note syllables are going = by at > about 464 per minute. > >> > >> the "Modern Major General's Song" by Gilbert and Sullivan goes by at > about 184mm, so it's syllables are about 368 per minute. > >> > >> arun > >> > >> > >> > >> On 6/18/16 4:07 AM, Huron, David wrote: > >>> We have a wide tolerance for speech with "normal" paces ranging > between 170 and 260 syllables per minute. > >>> (Yuan, Liberman & Cieri, 2006; Towards an integrated understanding of > speaking rate in conversation. INTER SPEECH conference Proc.) > >>> > >>> Music exhibits an enormous range of lyrical pace. Judy Garland's > rendition of "Somewhere Over the Rainbow" clocks in at a leisurely 64 > syllables per minute. By contrast, in "Ms. Jackson" by OutKast, rapper Bi= g > Boi reaches an extraordinary 379 syllables per minute. > >>> > >>> -David Huron with Nat Condit-Schultz > >>> > >>> ________________________________________ > >>> From: AUDITORY - Research in Auditory Perception [ > AUDITORY@xxxxxxxx on behalf of Bruno L. Giordano [ > brungio@xxxxxxxx > >>> Sent: Friday, June 17, 2016 8:32 AM > >>> To: AUDITORY@xxxxxxxx > >>> Subject: Statistics for word rate in natural speech > >>> > >>> Hello, > >>> > >>> I am looking for published statistics on average word rate in natural > speech (words/minute). > >>> > >>> Is there some golden standard reference for this? > >>> > >>> Thank you! > >>> > >>> Bruno > >>> > >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >>> Bruno L. Giordano, PhD > >>> Institute of Neuroscience and Psychology > >>> 58 Hillhead Street, University of Glasgow > >>> Glasgow, G12 8QB, Scotland > >>> T +44 (0) 141 330 5484 > >>> Www: http://www.brunolgiordano.net > >>> Email charter: http://www.emailcharter.org/ > >>> > --001a114e6e7ca22d9a0535b8c215 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div style=3D"line-height:1.35;padding-left:2em" class=3D"= "> <div class=3D"">Hi Kevin, Here are a couple references that address issue= s of word duration from a more linguistic perspective.=C2=A0 If you dig fur= ther I&#39;m afraid you will still find that most such studies deal in syll= ables (not words), in part because the notion of what counts as a single wo= rd is not uncontroversial (perhaps surprisingly to non-linguists).<br></div= ><div dir=3D"ltr"><br>@xxxxxxxx{baker_variability_2009,<br>=C2=A0=C2=A0=C2= =A0 title =3D {Variability in word duration as a function of probability, s= peech style, and prosody},<br>=C2=A0=C2=A0=C2=A0 volume =3D {52},<br>=C2=A0= =C2=A0=C2=A0 url =3D {<a href=3D"http://las.sagepub.com/cgi/doi/10.1177/002= 3830909336575">http://las.sagepub.com/cgi/doi/10.1177/0023830909336575</a>}= ,<br>=C2=A0=C2=A0=C2=A0 doi =3D {10.1177/0023830909336575},<br>=C2=A0=C2=A0= =C2=A0 number =3D {4},<br>=C2=A0=C2=A0=C2=A0 journal =3D {Language and Spee= ch},<br>=C2=A0=C2=A0=C2=A0 author =3D {Baker, Rachel E. and Bradlow, Ann R.= },<br>=C2=A0=C2=A0=C2=A0 year =3D {2009},<br>=C2=A0=C2=A0=C2=A0 pmid =3D {2= 0121039},<br>=C2=A0=C2=A0=C2=A0 pmcid =3D {PMC2841971},<br>=C2=A0=C2=A0=C2= =A0 pages =3D {391-413},<br>}<br><br>@xxxxxxxx{pellegrino_across-language_20= 11,<br>=C2=A0=C2=A0=C2=A0 title =3D {Across-language perspective on speech = information rate},<br>=C2=A0=C2=A0=C2=A0 volume =3D {87},<br>=C2=A0=C2=A0= =C2=A0 issn =3D {1535-0665},<br>=C2=A0=C2=A0=C2=A0 url =3D {<a href=3D"http= ://muse.jhu.edu/content/crossref/journals/language/v087/87.3.pellegrino.htm= l">http://muse.jhu.edu/content/crossref/journals/language/v087/87.3.pellegr= ino.html</a>},<br>=C2=A0=C2=A0=C2=A0 doi =3D {10.1353/lan.2011.0057},<br>= =C2=A0=C2=A0=C2=A0 number =3D {3},<br>=C2=A0=C2=A0=C2=A0 journal =3D {Langu= age},<br>=C2=A0=C2=A0=C2=A0 author =3D {Pellegrino, Fran=C3=A7ois and Coup= =C3=A9, Christophe and Marsico, Egidio},<br>=C2=A0=C2=A0=C2=A0 year =3D {20= 11},<br>=C2=A0=C2=A0=C2=A0 pages =3D {539-558},<br>}<br><br>-- dan<br><br>D= aniel McCloy<br><a href=3D"http://dan.mccloy.info/" target=3D"_blank">http:= //dan.mccloy.info/</a><br>Postdoctoral Research Associate<br>Institute for = Learning and Brain Sciences<br>University of Washington<br><br></div> <br> </div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Su= n, Jun 19, 2016 at 9:48 PM, Kevin Austin <span dir=3D"ltr">&lt;<a href=3D"m= ailto:kevin.austin@xxxxxxxx" target=3D"_blank">kevin.austin@xxxxxxxx= a</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thank you.<br> <br> I=E2=80=99m not a linguist or psycholinguist, so I write only from direct e= xperience.<br> <br> My reading is that the question is not very &#39;well-formed&#39;, and ther= efore the answers do not respond to the question.<br> <br> The question was about =E2=80=98words=E2=80=99 [whatever they may happen to= be], and the answers start with the idea of syllable, and Jont=E2=80=99s a= nswer seems to be in =E2=80=98base phonemic elements=E2=80=99. For example,= the two words, =E2=80=9CI=E2=80=9D, and =E2=80=9Cstopped=E2=80=9D, count t= wo words, each of one syllable, but =E2=80=98stopped=E2=80=99 is ccvcc [if = the /p/ is pronounced].<br> <br> 10ms [ie 100Hz] seems to be a very small duration, and may only apply to a = very limited number of phonemes. I had learned that the shortest time that = was reliable for the [sequential] discrimination of auditory events was in = the range of 25 to 40 ms =E2=80=94 40 to 25Hz. A ~16Hz limit works out to b= e around 60-70ms.<br> <br> But sixteen =E2=80=9Cwhat=E2=80=99s=E2=80=9D? Try the test. Record sixteen = one syllable words, with cv or vc forms: be, am, so, it, two, aught, tea, e= ar, tie . . etc Most of these are two phonemes, or three if a diphthong is = considered a grouped vowel, as in the word =E2=80=98tie=E2=80=99. Say them = quickly. Edit them into a sequence with no gaps, and shorten the sequence t= o be 1,000ms. Is it possible to do sequential segmentation? leaving aside t= he articulatory problems.<br> <br> Record: =E2=80=9CI spied the top pie=E2=80=9D, and =E2=80=9CNorth-eastern C= arolinian national seashore=E2=80=9D. Both are =E2=80=98five words=E2=80=99= . For interest, edit out the words: =E2=80=98top&#39;, =E2=80=98pie&#39;, = =E2=80=98Carolinian&#39;, and =E2=80=98national=E2=80=99. Tricks such as pr= oducing the /d/ in spied as being the stopped diphthong /ai/, and the contr= acting of the /p/, and the /n/, likely increase the rate of delivery in nat= ural speech, but most likely mostly in informal contexts.<br> <br> =E2=80=9CWhat was the question again?=E2=80=9D cv ccvc cv ccvccvcvcvc<br> <br> <br> Kevin<br> <br> <br> <br> <br> &gt; On 2016, Jun 19, at 8:03 AM, Jont Allen &lt;<a href=3D"mailto:jontalle= @xxxxxxxx">jontalle@xxxxxxxx</a>&gt; wrote:<br> &gt;<br> &gt; All,<br> &gt;<br> &gt; A comment that I hope is helpful.<br> &gt;<br> &gt; In our speech work we have learned, from extensive analysis, that the = fastest temporal resolution that speech is processed at, by the auditory sy= stem, is about 10 [ms].<br> &gt; That means that the natural temporal units for talking about speech (o= r singing) is in centiseconds [cs]. For example, the plosive burst of say /= ka/ is about 1-2 [cs].<br> &gt; I have not found very many examples of less than 1 [cs], as the percep= tion deteriorates quickly when you go below (shorter that) 1 [cs].<br> &gt;<br> &gt; Based the numbers below for rapper Big Boi, 379 syllables/m is about 1= 6 [cs]<br> &gt; 1000*60/379 =3D 15.8<br> &gt;<br> &gt; This seems like a nice way to quantify this rate. Its close to the per= ceptual lower limit of 1 [sc]. A full syllable (CV, VC) of 16 seems pretty = short.<br> &gt;<br> &gt; Jont Allen<br> &gt;<br> &gt; On 06/18/2016 11:39 PM, Arun Chandra wrote:<br> &gt;&gt; In Mozart&#39;s &quot;Le Nozze di Figaro&quot;, Bartolo sings his = revenge aria at about quarter =3D=3D 112mm, which means the syllables are g= oing by in triplets at about 336 per minute.<br> &gt;&gt;<br> &gt;&gt; in Rossini&#39;s &quot;Barber of Seville&quot;, the character Bart= olo (the same character, again) sings his accusing aria to Rosina (his ward= ) at about quarter =3D=3D 116mm, which means the sixteenth note syllables a= re going by at about 464 per minute.<br> &gt;&gt;<br> &gt;&gt; the &quot;Modern Major General&#39;s Song&quot; by Gilbert and Sul= livan goes by at about 184mm, so it&#39;s syllables are about 368 per minut= e.<br> &gt;&gt;<br> &gt;&gt; arun<br> &gt;&gt;<br> &gt;&gt;<br> &gt;&gt;<br> &gt;&gt; On 6/18/16 4:07 AM, Huron, David wrote:<br> &gt;&gt;&gt; We have a wide tolerance for speech with &quot;normal&quot; pa= ces ranging between 170 and 260 syllables per minute.<br> &gt;&gt;&gt; (Yuan, Liberman &amp; Cieri, 2006; Towards an integrated under= standing of speaking rate in conversation. INTER SPEECH conference Proc.)<b= r> &gt;&gt;&gt;<br> &gt;&gt;&gt; Music exhibits an enormous range of lyrical pace. Judy Garland= &#39;s rendition of &quot;Somewhere Over the Rainbow&quot; clocks in at a l= eisurely 64 syllables per minute. By contrast, in &quot;Ms. Jackson&quot; b= y OutKast, rapper Big Boi reaches an extraordinary 379 syllables per minute= .<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; -David Huron with Nat Condit-Schultz<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; ________________________________________<br> &gt;&gt;&gt; From: AUDITORY - Research in Auditory Perception [<a href=3D"m= ailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx</a>] on behalf of = Bruno L. Giordano [<a href=3D"mailto:brungio@xxxxxxxx">brungio@xxxxxxxx</= a>]<br> &gt;&gt;&gt; Sent: Friday, June 17, 2016 8:32 AM<br> &gt;&gt;&gt; To: <a href=3D"mailto:AUDITORY@xxxxxxxx">AUDITORY@xxxxxxxx= .MCGILL.CA</a><br> &gt;&gt;&gt; Subject: Statistics for word rate in natural speech<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; Hello,<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; I am looking for published statistics on average word rate in = natural speech (words/minute).<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; Is there some golden standard reference for this?<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; Thank you!<br> &gt;&gt;&gt;<br> &gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Bruno<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br> &gt;&gt;&gt; Bruno L. Giordano, PhD<br> &gt;&gt;&gt; Institute of Neuroscience and Psychology<br> &gt;&gt;&gt; 58 Hillhead Street, University of Glasgow<br> &gt;&gt;&gt; Glasgow, G12 8QB, Scotland<br> &gt;&gt;&gt; T <a href=3D"tel:%2B44%20%280%29%20141%20330%205484" value=3D"= +441413305484">+44 (0) 141 330 5484</a><br> &gt;&gt;&gt; Www: <a href=3D"http://www.brunolgiordano.net" rel=3D"noreferr= er" target=3D"_blank">http://www.brunolgiordano.net</a><br> &gt;&gt;&gt; Email charter: <a href=3D"http://www.emailcharter.org/" rel=3D= "noreferrer" target=3D"_blank">http://www.emailcharter.org/</a><br> &gt;&gt;&gt;<br> </blockquote></div><br></div> --001a114e6e7ca22d9a0535b8c215--


This message came from the mail archive
/var/www/html/postings/2016/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University