[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: musical tones in speech
Alain de Cheveigne' wrote on Saturday, May 12, 2001:
"Perhaps a useful next step would be for Martin to describe how speech
targets were obtained. I would like sufficient detail to implement an
automatic process by which I can produce targets, not necessarily as
reliable as those produced by hand, but that are likely to show a roughly
similar distribution. If automatic marking is
not possible, then please explain why, and what aspect specific to
hand-marking produced the remarkable distribution. Then perhaps I'll feel
comfortable that it exists."
On May 13, Al Bregman sent emails to Alain and me suggesting a continuation
of the controversy directly in private letters between us, the outcome of
which could then be reported to the list.
This report, from my side, is as follows:
______________
REPORT
I sent a reply to the question above, with attached data material, to Alain.
When there was no response after two days, I asked him if the material had
been of use. He then replied that he had "no desire to carry on a private
discussion" with me.
Upon this I sent him the following message:
"I had not asked for a discussion. I just had provided material that you had
asked for. It is common in such a case to thank for the delivery and to say
if it was of use.
I must conclude now that your question to the list was not a question to
gather information but a rhetorical question for pure propaganda purposes."
I now consider my defense against Alain's allegations as closed. Anybody
else who might have questions to me concerning the paper is of course
welcome to put them forward, either on this list or in private letters.
END OF REPORT
_______________
For those of you who might have an interest in the technical details of
extracting the f0 of speech targets, the information I sent to Alain is
enclosed below. In case of interest in the originally attached Excel file,
please contact me.
Martin
______________________________________________________________
QUOTE
From: Martin Braun
To: Alain de Cheveigne'
Sent: Thursday, May 17, 2001
Alain,
thank you for the apology.
If your only intention was to test the possibility that I might have fallen
for the first cardinal error in science, "to find something that does not
exist", we can soon come to an agreement.
What had worried me was that you did not ask straightforward technical
questions but presented the view that the only question was WHAT had gone
wrong, not IF something had gone wrong.
I'll now answer what appears to be your main technical question. I do this
in a private letter, because the interest in technical questions among
readers of the list is very limited.
On May 12 you wrote in a letter to the list replying to Bruno:
"Perhaps a useful next step would be for Martin to describe how speech
targets were obtained. I would like sufficient detail to implement an
automatic process by which I can produce targets, not necessarily as
reliable as those produced by hand, but that are likely to show a roughly
similar distribution. If automatic marking is
not possible, then please explain why, and what aspect specific to
hand-marking produced the remarkable distribution. Then perhaps I'll feel
comfortable that it exists."
As Bob Ladd wrote, automatic marking is indeed not possible. This is
regrettable, but we have no option. We can, however, exclude unnoticed
selection bias in hand marking.
The crucial point in speech targets is that they are linguistic AS WELL AS
acoustic categories, or, in other words, phonetic AS WELL AS phonological
ones. They only have come into the focus of research, because they are
functional in communication on the basis on certain linguistic rules.
Extraction of speech-target f0 therefore has to be done on the basis of the
given linguistic patterns of speech utterances.
Let's take the sentence in Fig.1A of the paper, "Je moet de mooie rozen in
een gele vaas doen."
A.
Here the researchers determined the following speech targets beforehand:
1) initial pitch: f0 of the quasi-steady-state [e:] in "Je"
2) peak: f0 peak of [u:] in "moet" (there is only this one simple vowel in
"moet")
3) valley: f0 valley of vowel in "de"
4) accent peak: f0 peak of accented vowel (the first one) in "mooie"
5) accent peak: f0 peak of accented vowel (the first one) in "rozen"
6) valley: f0 valley between preceding and following accent peak, here in
the one simple vowel of "een"
7) accent peak: f0 peak of accented vowel (the first one) in "gele"
8) accent peak: f0 peak of vowel in "vaas"
9) final low: f0 of the quasi-steady-state [u:] in "doen" (there is only
this one simple vowel in "doen")
B.
Then, having the f0 trace and the sound wave-form time-aligned infront of
them, the researchers can recognize words and vowels and mark the f0 trace
according to the predetermined linguistic speech-target pattern.
C.
After hand marking of the speech targets an adequate algorithm presents a
table for each single speech version of a given sentence. Two of such tables
(plus contour graphs) are in an Excel file that I put in the attachment. The
data I analyzed are based on 2,400 such tables.
If further questions arise, please let me know.
Martin
UNQUOTE
_______________________________________________________________
----- Original Message -----
From: Alain de Cheveigne' <Alain.de.Cheveigne@IRCAM.FR>
To: <AUDITORY@LISTS.MCGILL.CA>
Sent: Saturday, May 12, 2001 1:22 PM
Subject: Re: musical tones in speech