Re: musical tones in speech (Martin Braun )


Subject: Re: musical tones in speech
From:    Martin Braun  <nombraun(at)POST.NETLINK.SE>
Date:    Sat, 19 May 2001 18:44:02 +0200

Alain de Cheveigne' wrote on Saturday, May 12, 2001: "Perhaps a useful next step would be for Martin to describe how speech targets were obtained. I would like sufficient detail to implement an automatic process by which I can produce targets, not necessarily as reliable as those produced by hand, but that are likely to show a roughly similar distribution. If automatic marking is not possible, then please explain why, and what aspect specific to hand-marking produced the remarkable distribution. Then perhaps I'll feel comfortable that it exists." On May 13, Al Bregman sent emails to Alain and me suggesting a continuation of the controversy directly in private letters between us, the outcome of which could then be reported to the list. This report, from my side, is as follows: ______________ REPORT I sent a reply to the question above, with attached data material, to Alain. When there was no response after two days, I asked him if the material had been of use. He then replied that he had "no desire to carry on a private discussion" with me. Upon this I sent him the following message: "I had not asked for a discussion. I just had provided material that you had asked for. It is common in such a case to thank for the delivery and to say if it was of use. I must conclude now that your question to the list was not a question to gather information but a rhetorical question for pure propaganda purposes." I now consider my defense against Alain's allegations as closed. Anybody else who might have questions to me concerning the paper is of course welcome to put them forward, either on this list or in private letters. END OF REPORT _______________ For those of you who might have an interest in the technical details of extracting the f0 of speech targets, the information I sent to Alain is enclosed below. In case of interest in the originally attached Excel file, please contact me. Martin ______________________________________________________________ QUOTE From: Martin Braun To: Alain de Cheveigne' Sent: Thursday, May 17, 2001 Alain, thank you for the apology. If your only intention was to test the possibility that I might have fallen for the first cardinal error in science, "to find something that does not exist", we can soon come to an agreement. What had worried me was that you did not ask straightforward technical questions but presented the view that the only question was WHAT had gone wrong, not IF something had gone wrong. I'll now answer what appears to be your main technical question. I do this in a private letter, because the interest in technical questions among readers of the list is very limited. On May 12 you wrote in a letter to the list replying to Bruno: "Perhaps a useful next step would be for Martin to describe how speech targets were obtained. I would like sufficient detail to implement an automatic process by which I can produce targets, not necessarily as reliable as those produced by hand, but that are likely to show a roughly similar distribution. If automatic marking is not possible, then please explain why, and what aspect specific to hand-marking produced the remarkable distribution. Then perhaps I'll feel comfortable that it exists." As Bob Ladd wrote, automatic marking is indeed not possible. This is regrettable, but we have no option. We can, however, exclude unnoticed selection bias in hand marking. The crucial point in speech targets is that they are linguistic AS WELL AS acoustic categories, or, in other words, phonetic AS WELL AS phonological ones. They only have come into the focus of research, because they are functional in communication on the basis on certain linguistic rules. Extraction of speech-target f0 therefore has to be done on the basis of the given linguistic patterns of speech utterances. Let's take the sentence in Fig.1A of the paper, "Je moet de mooie rozen in een gele vaas doen." A. Here the researchers determined the following speech targets beforehand: 1) initial pitch: f0 of the quasi-steady-state [e:] in "Je" 2) peak: f0 peak of [u:] in "moet" (there is only this one simple vowel in "moet") 3) valley: f0 valley of vowel in "de" 4) accent peak: f0 peak of accented vowel (the first one) in "mooie" 5) accent peak: f0 peak of accented vowel (the first one) in "rozen" 6) valley: f0 valley between preceding and following accent peak, here in the one simple vowel of "een" 7) accent peak: f0 peak of accented vowel (the first one) in "gele" 8) accent peak: f0 peak of vowel in "vaas" 9) final low: f0 of the quasi-steady-state [u:] in "doen" (there is only this one simple vowel in "doen") B. Then, having the f0 trace and the sound wave-form time-aligned infront of them, the researchers can recognize words and vowels and mark the f0 trace according to the predetermined linguistic speech-target pattern. C. After hand marking of the speech targets an adequate algorithm presents a table for each single speech version of a given sentence. Two of such tables (plus contour graphs) are in an Excel file that I put in the attachment. The data I analyzed are based on 2,400 such tables. If further questions arise, please let me know. Martin UNQUOTE _______________________________________________________________ ----- Original Message ----- From: Alain de Cheveigne' <Alain.de.Cheveigne(at)IRCAM.FR> To: <AUDITORY(at)LISTS.MCGILL.CA> Sent: Saturday, May 12, 2001 1:22 PM Subject: Re: musical tones in speech


This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University