Re: Statistics for word rate in natural speech

To: AUDITORY@xxxxxxxxxxxxxxx

Subject: Re: Statistics for word rate in natural speech

From: Kevin Austin <kevin.austin@xxxxxxxxxxxx>

Date: Mon, 20 Jun 2016 14:06:45 -0400

Approved-by: kevin.austin@xxxxxxxxxxxx

Comments: To: rankovic@xxxxxxxxxxxxxxxx

In-reply-to: <ExSFbrd4OkBnLExSGbPUsp@videotron.ca>

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <31063_1466223121_5764CA11_31063_5_2_30680c0a-cebe-abfa-d3f0-4aca3ef9e508@gmail.com> <24981_1466308929_57661941_24981_25_1_9942EF80C8B95F4A83E611A793CD9A6A55025E0B@CIO-TNC-D2MBX01.osuad.osu.edu> <24981_1466311821_5766248D_24981_473_1_0d902d86-3e1d-8852-c290-065550b0f31a@evergreen.edu> <EqRgbRnQBb0CVEqRhbeet3@videotron.ca> <16024_1466399812_57677C43_16024_937_1_F1D3644E-DEE3-4F75-BC3D-FCDAC0834586@videotron.ca> <ExSFbrd4OkBnLExSGbPUsp@videotron.ca>

Reply-to: Kevin Austin <kevin.austin@xxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Thank you.

I wonder about the technology they used 75 years ago for the measurement. I just used two one-minute readings from a well-known unread book, the openings of Episode One, and Episode Seventeen, one is narrative, the other catechismic, with the following results:

• 160 words / 213 syllables [~280 ms/syllable]
• 146 words / 265 syllables [~225 ms/syllable]

This included pauses [,] and cessations [.].

I then removed most of the ‘silences’, and breathing, and brought each text down to around 52 seconds, bringing the first example down to a breathless average of 244ms, and the second down to an even more breathless ~195ms.

Some specific examples:

joining ~ 390ms = 145ms/syllable

growth ~ 270ms

gaslight ~ 600ms = 300ms/syllable

friendship ~ 400ms = 200ms/syllable

Bloom and Steven ~ 880ms = 220ms/syllable

I compressed “Bloom and Steven” down to 500ms, ie 1/8-second per syllable, and the 13 phonemes, — 26 phonemes/sec, are at about 40Hz. Perhaps in 1940 ‘conversational speech’ was this fast.

Kevin

Also: https://www.quora.com/Speeches-For-the-average-person-speaking-at-a-normal-pace-what-is-the-typical-number-of-words-they-can-say-in-one-minute

I am a professional speaker and podcast host and I speak at approximately 145-160 words per minute (wpm), while many sources state that average American English speaker engaged in a friendly conversation speaks at a rate of approximately 110–150 wpm.

On 2016, Jun 20, at 7:33 AM, Christine Rankovic <rankovic@xxxxxxxxxxxxxxxx> wrote:

Dunn and White (1940) is a classic report on speech measurements. They assumed 1/8-second as the length of a syllable for their classic measurements.

The reference is: Dunn, H.K. and White, S.D. (1940). Statistical Measurements on Conversational Speech. Journal of the Acoustical Society of America 11:278-288.

Christine Rankovic, PhD
Speech and Hearing Scientist

-----Original Message-----
From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Kevin Austin
Sent: Monday, June 20, 2016 12:48 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Statistics for word rate in natural speech

Thank you.

I’m not a linguist or psycholinguist, so I write only from direct experience.

My reading is that the question is not very 'well-formed', and therefore the answers do not respond to the question.

The question was about ‘words’ [whatever they may happen to be], and the answers start with the idea of syllable, and Jont’s answer seems to be in ‘base phonemic elements’. For example, the two words, “I”, and “stopped”, count two words, each of one syllable, but ‘stopped’ is ccvcc [if the /p/ is pronounced].

10ms [ie 100Hz] seems to be a very small duration, and may only apply to a very limited number of phonemes. I had learned that the shortest time that was reliable for the [sequential] discrimination of auditory events was in the range of 25 to 40 ms — 40 to 25Hz. A ~16Hz limit works out to be around 60-70ms.

But sixteen “what’s”? Try the test. Record sixteen one syllable words, with cv or vc forms: be, am, so, it, two, aught, tea, ear, tie . . etc Most of these are two phonemes, or three if a diphthong is considered a grouped vowel, as in the word ‘tie’. Say them quickly. Edit them into a sequence with no gaps, and shorten the sequence to be 1,000ms. Is it possible to do sequential segmentation? leaving aside the articulatory problems.

Record: “I spied the top pie”, and “North-eastern Carolinian national seashore”. Both are ‘five words’. For interest, edit out the words: ‘top', ‘pie', ‘Carolinian', and ‘national’. Tricks such as producing the /d/ in spied as being the stopped diphthong /ai/, and the contracting of the /p/, and the /n/, likely increase the rate of delivery in natural speech, but most likely mostly in informal contexts.

“What was the question again?” cv ccvc cv ccvccvcvcvc

Kevin

On 2016, Jun 19, at 8:03 AM, Jont Allen <jontalle@xxxxxxxxxxxx> wrote:

All,

A comment that I hope is helpful.

In our speech work we have learned, from extensive analysis, that the fastest temporal resolution that speech is processed at, by the auditory system, is about 10 [ms].
That means that the natural temporal units for talking about speech (or singing) is in centiseconds [cs]. For example, the plosive burst of say /ka/ is about 1-2 [cs].
I have not found very many examples of less than 1 [cs], as the perception deteriorates quickly when you go below (shorter that) 1 [cs].

Based the numbers below for rapper Big Boi, 379 syllables/m is about
16 [cs]
1000*60/379 = 15.8

This seems like a nice way to quantify this rate. Its close to the perceptual lower limit of 1 [sc]. A full syllable (CV, VC) of 16 seems pretty short.

Jont Allen

On 06/18/2016 11:39 PM, Arun Chandra wrote:
In Mozart's "Le Nozze di Figaro", Bartolo sings his revenge aria at about quarter == 112mm, which means the syllables are going by in triplets at about 336 per minute.

in Rossini's "Barber of Seville", the character Bartolo (the same character, again) sings his accusing aria to Rosina (his ward) at about quarter == 116mm, which means the sixteenth note syllables are going by at about 464 per minute.

the "Modern Major General's Song" by Gilbert and Sullivan goes by at about 184mm, so it's syllables are about 368 per minute.

arun

On 6/18/16 4:07 AM, Huron, David wrote:
We have a wide tolerance for speech with "normal" paces ranging between 170 and 260 syllables per minute.
(Yuan, Liberman & Cieri, 2006; Towards an integrated understanding
of speaking rate in conversation. INTER SPEECH conference Proc.)

Music exhibits an enormous range of lyrical pace. Judy Garland's rendition of "Somewhere Over the Rainbow" clocks in at a leisurely 64 syllables per minute. By contrast, in "Ms. Jackson" by OutKast, rapper Big Boi reaches an extraordinary 379 syllables per minute.

-David Huron with Nat Condit-Schultz

________________________________________
From: AUDITORY - Research in Auditory Perception
[AUDITORY@xxxxxxxxxxxxxxx] on behalf of Bruno L. Giordano
[brungio@xxxxxxxxx]
Sent: Friday, June 17, 2016 8:32 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Statistics for word rate in natural speech

Hello,

I am looking for published statistics on average word rate in natural speech (words/minute).

Is there some golden standard reference for this?

Thank you!

Bruno

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bruno L. Giordano, PhD
Institute of Neuroscience and Psychology
58 Hillhead Street, University of Glasgow Glasgow, G12 8QB, Scotland
T +44 (0) 141 330 5484
Www: http://www.brunolgiordano.net
Email charter: http://www.emailcharter.org/