Subject: Re: musical tones in speech From: Martin Braun <nombraun(at)POST.NETLINK.SE> Date: Sat, 12 May 2001 11:02:07 +0200Alain, honestly, with the techniques you applied so far, it was not possible to detect "musical tones in speech". Let's take the example sentences in Fig.1 of my paper, which most readers of this list can easily look at: http://ojps.aip.org/journal_cgi/dbt?KEY=ARLOFJ&Volume=LASTVOL&Issue=LASTISS Perhaps we can agree on the following facts: 1) For such a sentence there are typically about 200 f0 values, but only 9 speech targets. Because the vast majority of the 200 f0 values are pitch transitions between speech targets, it is obvious that the pitch of the 9 speech targets will be hopelessly lost in data noise, if you use the 200 f0 values for a histogram. 2a) Such a sentence typically has about 5 periods of voicelessness, i.e. 6 sections of contiguous voiced portions. If your software extracts minima and maxima on a basis of such sections, you have 12 f0 values. If your software is not highly sophisticated, you will get mainly extreme f0 values that are caused by section onset or section offset. Such erratic data have no relation to speech targets. 2b) If you have a software that rules out all erratic f0 points, some of the 12 f0 values will agree with the 9 speech targets. But it is uncertain which, and how many of them. You would still have too much noise in your histogram to see a pattern in the pitch distribution of speech targets. 2c) If your software takes "breath groups" as units, each of such a sentence will be only one unit. You would get 2 f0 values per sentence. Due to the problems described under (2a) and (2b) above, it would be uncertain, if any of the 2 f0 values would agree with one of the 9 speech targets. Perhaps we can now agree that there is no other way than hand-marking the speech targets in the f0 contours first, and then extract them. There is another vital point. As reported in the paper, the IPO researchers presented such sentence material to the speakers that was likely to elicit clear and reproducible peaks and valleys in the pitch contour. In much other speech material, peaks and valleys are often accidental. It is unknown at present, if such material also reflects an influence of an absolute memory of musical tones. It might, and it might not. Martin P.S. Alain de Cheveigné wrote in a later message: "As to how such an artifact could arise, a possibility is that the software that was used to choose targets quantized F0 values to semitones." Alain, it does not seem to be a convincing strategy to present speculations on flaws in the work of others, just because it doesn't fit one's own incomplete results. A semitone resolution would have been insufficient both for the original studies at the IPO and for my study. The resolution in the raw f0 values was 25 Cent, which is very common and sufficient for the studies at issue. ----- Original Message ----- From: Alain de Cheveigne' <Alain.de.Cheveigne(at)IRCAM.FR> To: <AUDITORY(at)LISTS.MCGILL.CA> Sent: Friday, May 11, 2001 4:15 PM Subject: Re: musical tones in speech