Re: histograms of F0 in speech contours ("Alain de Cheveigne'" )


Subject: Re: histograms of F0 in speech contours
From:    "Alain de Cheveigne'"  <Alain.de.Cheveigne(at)IRCAM.FR>
Date:    Sun, 13 May 2001 14:16:44 +0200

Christian, >Alain, would you be ready to let Martin's data >pass through your algorithm? Sure. I was planning to ask the authors of the original study for it. My F0 data are also available (for some I must check with the owners of the databases before sharing). But you should read the paper before deciding to be skeptical or not. It's easily available. Bias due to lines on the display is a possible explanation, but another is chance. The main effect (more samples within a quarter semitone from notes on the scale than elsewhere) is reported to be significant to p=0.04, meaning that from the start there was one chance in 25 that we're all talking about a random pattern. More if the author was on the lookout for an effect like this, and (conciously or not) chose this database among others. Given the number of experiments we make every day, it's not surprising that we stumble on something like this from time to time. Further tests found greater significance. Notes ACDEFG (note: no B) were selected on the grounds that they are more common than others in western music (and despite the fact that they were actually _less_ common than others in the data). This boosted p to 0.002. It's not clear if this selection was planned after the author looked at the data. If so, there is a chance that (conciously or unconciously) the data were scanned for a subset with a pattern that made sense. If so a "highly significant" p is no surprise. The same data yielded a wealth of other interesting patterns. A sharp jump is seen between E3 and F3, from which is drawn a "conspicuous parallel" to involuntary register changes in singing that occur in the "regions around E3, E4 and E5". Parallels are also drawn with Carlyon and Shackleton (1994) that are taken to support two parallel pitch mechanisms with a transition at 170 Hz. And also with vocal fold models. The over-representation of ACDEFG is stronger if you select the 75% of the target data with low standard deviation. However it turns into a significant underrepresentation for females that speak in a loud voice. So, it's not as if there were converging data to support a single claim. Rather, there's a bunch of claims that radiate from a single set of data that have been squeezed like a lemon. I should add that the priors are not in favor of there being an effect. The idea of a connection between AP and voice is not new and has certainly been searched for before, yet the author claims to be the first one to have found it. In years working on F0 estimation I've never seen this sort of effect (though I might have missed it). In years reading about pitch I've never seen anything that fits with it. If it did exist, I'd expect it to take a much different shape, for example that of an anchor note rather than a scale with all white notes but B. Many professional singers have trouble starting in key a capella, etc.. Given these unfavorable priors it would take a very convincing experiment to support this theory. Alain -------------------------------------------------------------- Alain de Cheveigne' CNRS/IRCAM, 1 place Stravinsky, 75004, Paris. phone: +33 1 44784846, fax: 44781540, email: cheveign(at)ircam.fr http://www.ircam.fr/equipes/pcm/cheveign --------------------------------------------------------------


This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University