I wonder about the technology they used 75 years ago for the measurement. I just used two one-minute readings from a well-known unread book, the openings of Episode One, and Episode Seventeen, one is narrative, the other catechismic, with the following results:
• 160 words / 213 syllables [~280 ms/syllable]
• 146 words / 265 syllables [~225 ms/syllable]
This included pauses [,] and cessations [.].
I then removed most of the ‘silences’, and breathing, and brought each text down to around 52 seconds, bringing the first example down to a breathless average of 244ms, and the second down to an even more breathless ~195ms.
Some specific examples:
joining ~ 390ms = 145ms/syllable
growth ~ 270ms
gaslight ~ 600ms = 300ms/syllable
friendship ~ 400ms = 200ms/syllable
Bloom and Steven ~ 880ms = 220ms/syllable
I compressed “Bloom and Steven” down to 500ms, ie 1/8-second per syllable, and the 13 phonemes, — 26 phonemes/sec, are at about 40Hz. Perhaps in 1940 ‘conversational speech’ was this fast.
I am a professional speaker and podcast host and I speak at approximately 145-160 words per minute (wpm), while many sources state that average American English speaker engaged in a friendly conversation speaks at a rate of approximately 110–150 wpm.
On 2016, Jun 20, at 7:33 AM, Christine Rankovic <rankovic@xxxxxxxxxxxxxxxx> wrote:
Dunn and White (1940) is a classic report on speech measurements. They assumed 1/8-second as the length of a syllable for their classic measurements.
The reference is: Dunn, H.K. and White, S.D. (1940). Statistical Measurements on Conversational Speech. Journal of the Acoustical Society of America 11:278-288.
Christine Rankovic, PhD
Speech and Hearing Scientist
From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Kevin Austin
Sent: Monday, June 20, 2016 12:48 AM
Subject: Re: Statistics for word rate in natural speech
I’m not a linguist or psycholinguist, so I write only from direct experience.
My reading is that the question is not very 'well-formed', and therefore the answers do not respond to the question.
The question was about ‘words’ [whatever they may happen to be], and the answers start with the idea of syllable, and Jont’s answer seems to be in ‘base phonemic elements’. For example, the two words, “I”, and “stopped”, count two words, each of one syllable, but ‘stopped’ is ccvcc [if the /p/ is pronounced].
10ms [ie 100Hz] seems to be a very small duration, and may only apply to a very limited number of phonemes. I had learned that the shortest time that was reliable for the [sequential] discrimination of auditory events was in the range of 25 to 40 ms — 40 to 25Hz. A ~16Hz limit works out to be around 60-70ms.
But sixteen “what’s”? Try the test. Record sixteen one syllable words, with cv or vc forms: be, am, so, it, two, aught, tea, ear, tie . . etc Most of these are two phonemes, or three if a diphthong is considered a grouped vowel, as in the word ‘tie’. Say them quickly. Edit them into a sequence with no gaps, and shorten the sequence to be 1,000ms. Is it possible to do sequential segmentation? leaving aside the articulatory problems.
Record: “I spied the top pie”, and “North-eastern Carolinian national seashore”. Both are ‘five words’. For interest, edit out the words: ‘top', ‘pie', ‘Carolinian', and ‘national’. Tricks such as producing the /d/ in spied as being the stopped diphthong /ai/, and the contracting of the /p/, and the /n/, likely increase the rate of delivery in natural speech, but most likely mostly in informal contexts.
“What was the question again?” cv ccvc cv ccvccvcvcvc
On 2016, Jun 19, at 8:03 AM, Jont Allen <jontalle@xxxxxxxxxxxx> wrote:
A comment that I hope is helpful.
In our speech work we have learned, from extensive analysis, that the fastest temporal resolution that speech is processed at, by the auditory system, is about 10 [ms].
That means that the natural temporal units for talking about speech (or singing) is in centiseconds [cs]. For example, the plosive burst of say /ka/ is about 1-2 [cs].
I have not found very many examples of less than 1 [cs], as the perception deteriorates quickly when you go below (shorter that) 1 [cs].
Based the numbers below for rapper Big Boi, 379 syllables/m is about
1000*60/379 = 15.8
This seems like a nice way to quantify this rate. Its close to the perceptual lower limit of 1 [sc]. A full syllable (CV, VC) of 16 seems pretty short.
On 06/18/2016 11:39 PM, Arun Chandra wrote:
In Mozart's "Le Nozze di Figaro", Bartolo sings his revenge aria at about quarter == 112mm, which means the syllables are going by in triplets at about 336 per minute.
in Rossini's "Barber of Seville", the character Bartolo (the same character, again) sings his accusing aria to Rosina (his ward) at about quarter == 116mm, which means the sixteenth note syllables are going by at about 464 per minute.
the "Modern Major General's Song" by Gilbert and Sullivan goes by at about 184mm, so it's syllables are about 368 per minute.
On 6/18/16 4:07 AM, Huron, David wrote:
We have a wide tolerance for speech with "normal" paces ranging between 170 and 260 syllables per minute.
(Yuan, Liberman & Cieri, 2006; Towards an integrated understanding
of speaking rate in conversation. INTER SPEECH conference Proc.)
Music exhibits an enormous range of lyrical pace. Judy Garland's rendition of "Somewhere Over the Rainbow" clocks in at a leisurely 64 syllables per minute. By contrast, in "Ms. Jackson" by OutKast, rapper Big Boi reaches an extraordinary 379 syllables per minute.
-David Huron with Nat Condit-Schultz
From: AUDITORY - Research in Auditory Perception
[AUDITORY@xxxxxxxxxxxxxxx] on behalf of Bruno L. Giordano
Sent: Friday, June 17, 2016 8:32 AM
Subject: Statistics for word rate in natural speech
I am looking for published statistics on average word rate in natural speech (words/minute).
Is there some golden standard reference for this?
Bruno L. Giordano, PhD
Institute of Neuroscience and Psychology
58 Hillhead Street, University of Glasgow Glasgow, G12 8QB, Scotland
T +44 (0) 141 330 5484
Email charter: http://www.emailcharter.org/