[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Voice Quality

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Voice Quality
From: Rahul Shrivastav <rahul@xxxxxxxxxxx>
Date: Wed, 12 Nov 2003 13:44:35 -0500
Comments: cc: Seetharamakrishnan <seethark@ETH.NET>
Delivery-date: Wed Nov 12 14:10:21 2003
Importance: Normal
In-reply-to: <000501c3a93e$8f522e20$9bb409ca@seethark.eth.net>
Reply-to: Rahul Shrivastav <rahul@xxxxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

SRK,

You have raised a question that is not easy to answer, and here are my
two cents on these issues:

First, perceived voice quality results from multiple acoustic changes,
and the measures you have described will all have a role in the
quantifying the final percept. Some of these measures (eg. Jitter,
shimmer) are influenced by the phonetic environment, fundamental
frequency and the window size used for analyses - so you will have to be
very careful when you make your measurements.

Second, listener's judgment of voice quality are most likely also
influenced by suprasegmental and paralinguistic factors (age and sex of
speaker, emotions, etc.). I do not know if these factors come into play
in your work.

Third, as of today, I am not aware of any "standard" formula to put all
the various measures together and come up with a single index of "good
voice quality." While some efforts towards this are ongoing, these often
look at one or two sub-types of quality.

If you have a pre-determined set of stimuli that you consider "ideal",
you may be able to use some sort of distance measure to determine how
the given stimulus is different from the ideal. Most likely, you will
have to apply some time-normalization procedures (e.g. dynamic
time-warping) before you calculate these measures. However, it seems
like your stimuli differ in their content and this may not be possible.

If you do not have a pre-determined "ideal" stimulus, you may want to
choose a set of measures that you believe will adequately reflect the
nature of the voice qualities that you expect to see in your set. You
can then compare the measures from your stimuli to the published
normative data on that measure. The voices closest to the ideal would
likely be the ones that have minimal deviations from the normative set.

Hope this helps! I am curious to know what other folks on the list have
to say -- would you mind sharing the information you get with me?

Thanks,
Rahul

----------------------------
Rahul Shrivastav, Ph.D.
Assistant Professor
Communication Sciences and Disorders
Dauer Hall, Room 48
Gainesville FL 32611

Phone: (352) 392-2046 (ext. 230)
Fax: (352) 392-6170

----------------------------


-----Original Message-----
From: AUDITORY Research in Auditory Perception
[mailto:AUDITORY@LISTS.MCGILL.CA] On Behalf Of Seetharamakrishnan
Sent: Wednesday, November 12, 2003 12:01 PM
To: AUDITORY@LISTS.MCGILL.CA
Subject: Voice Quality


Dear Friends

I am not an expert in voice analysis, but yet I have to assess voice
quality from conversational speech. ie Compare an "ideal voice" with
spoken voice. ie The ideal voice would be recorded when the voice
quality is good. And whenever the same person speaks, his/her voice will
be compared to this ideal voice parameters and deviations will be
indicated. The content and duration of the ideal voice and spoken voice
will be different. I have some software to measure sound analysis
parameters like Intensity, Pitch, HNR, Mean DB, SD, Jitter, Shimmer,
Silence,  Unvoiced frames etc..

I dont know how to correlate between the measured values and the
perceived quality of voice. Now my question is, what measurement
parameters can be reliably used in order to compare the "ideal voice"
and spoken voice  and how ?

Only criteria is that the spoken voice should have definitely deviated
qualitywise in some manner or other. I am not able to arrive at what
measurements I can reliably and consistently use to satisfy the above
criteria though I know that certain measurements like mean DB, mean
Pitch, silence percentage, number of unvoiced frames, voice breaks etc
can be used.

One more thing is, whether the how much window size (time in seconds)
should be taken to arrive at some reliable comparison.

Any light on this topic would be appreciated.

Regards
srk

References:
- Voice Quality
  - From: Seetharamakrishnan

Prev by Date: Voice Quality
Next by Date: Consonant/vowel perception by elderly
Previous by thread: Voice Quality
Next by thread: Consonant/vowel perception by elderly
Index(es):
- Date
- Thread