Re: P-Centers

Dear Daniele,

Tim Ives wrote a robust routine for specifying the p-centers of CV and VC syllables in 2005. It is described in

Ives, D. T., Smith, D. R. R. and Patterson, R. D. (2005). "Discrimination of speaker size from syllable phrases," J. Acoust. Soc. Am. 118 (6), 3816-3822.

The relevant text reads

The syllables were normalized by setting the RMS value in the region of the vowel to a common value so that they were all perceived to have about the same loudness.  We also wanted to ensure that, when any combination of the syllables was played in a sequence, they would be perceived to proceed at a regular pace; an irregular sequence of syllables causes an unwanted distraction. Accordingly, the positions of the syllables within their files were adjusted so that their perceptual-centers (P-centers) all occurred at the same time relative to file onset. The algorithm for finding the P-centers was based on procedures described by Marcus (1981) and Scott (1993), and it focuses on vowel onsets. Vowel onset time was taken to be the time at which the syllable first rises to 50 % of its maximum value over the frequency range of 300-3000 Hz.  To optimize the estimation of vowel onset time, the syllable was filtered with a gammatone filterbank (Patterson et al., 1992) having thirty channels spaced quasi-logarithmically over the frequency range of 300-3000 Hz.  The thirty channels were sorted in descending order based on their maximum output value and the ten highest were selected.  The Hilbert envelope was calculated for these ten channels and, for each, the time at which the level first rose to 50 % of the maximum was determined; the vowel onset time was taken to be the mean of these ten time values.  The P-centre was determined from the vowel onset time and the duration of the signal as described by Marcus (1981). The P-center adjustment was achieved by the simple expedient of inserting silence before and/or after the sound. After P-center correction the length of each syllable, including the silence, was 683 ms.

I can provide a pdf of the paper if you would like to see it.

Sincerely, Roy P

On 11/10/2016 15:41, SCHON Daniele wrote:

Deal all,

I wrote some Matlab code to automatically extract the perceptual centres from natural speech and I realized that there are several options and choices to be made.

I would appreciate your input on this topic (ie how you would generally proceed after speech-syllable segmentation) and I'll try to summarize the comments in a future post with my implemented solution(s) and the link to the code (with all the bugs!).



