[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: histograms of F0 in speech contours



Christian,

>Alain, would you be ready to let Martin's data
>pass through your algorithm?

Sure.  I was planning to ask the  authors of the original study for it.  My
F0 data are also available (for some I must check with the owners of the
databases before sharing).  But you should read the paper before deciding
to be skeptical or not.  It's easily available.

Bias due to lines on the display is a possible explanation, but another is
chance.  The main effect (more samples within a quarter semitone from notes
on the scale than elsewhere) is reported to be significant to p=0.04,
meaning that from the start there was one chance in 25 that we're all
talking about a random pattern.  More if the author was on the lookout for
an effect like this, and (conciously or not) chose this database among
others.  Given the number of experiments we make every day, it's not
surprising that we stumble on something like this from time to time.

Further tests found greater significance.  Notes ACDEFG (note: no B) were
selected on the grounds that they are more common than others in western
music (and despite the fact that they were actually _less_ common than
others in the data).  This boosted p to 0.002.  It's not clear if this
selection was planned after the author looked at the data.  If so, there is
a chance that (conciously or unconciously) the data were scanned for a
subset with a pattern that made sense.  If so a "highly significant" p is
no surprise.

The same data yielded a wealth of other interesting patterns.  A sharp jump
is seen between E3 and F3, from which is drawn a "conspicuous parallel" to
involuntary register changes in singing that occur in the "regions around
E3, E4 and E5".  Parallels are also drawn with Carlyon and Shackleton
(1994) that are taken to support two parallel pitch mechanisms with a
transition at 170 Hz.  And also with vocal fold models.  The
over-representation of ACDEFG is stronger if you select the 75% of the
target data with low standard deviation.  However it turns into a
significant underrepresentation for females that speak in a loud voice.
So, it's not as if there were converging data to support a single claim.
Rather, there's a bunch of claims that radiate from a single set of data
that have been squeezed like a lemon.

I should add that the priors are not in favor of there being an effect.
The idea of a connection between AP and voice is not new and has certainly
been searched for before, yet the author claims to be the first one to have
found it.  In years working on F0 estimation I've never seen this sort of
effect (though I might have missed it).  In years reading about pitch I've
never seen anything that fits with it.  If it did exist, I'd expect it to
take a much different shape, for example that of an anchor note rather than
a scale with all white notes but B.  Many professional singers have trouble
starting in key a capella, etc..  Given these unfavorable priors it would
take a very convincing experiment to support this theory.

Alain







--------------------------------------------------------------
Alain de Cheveigne'
CNRS/IRCAM, 1 place Stravinsky, 75004, Paris.
phone: +33 1 44784846, fax: 44781540, email: cheveign@ircam.fr
http://www.ircam.fr/equipes/pcm/cheveign
--------------------------------------------------------------