Dear
list,
I am wondering
if any one has any references or suggestions about this
question. These days I hear more and more people talking
to machines, e.g. Siri, Google, Alexa, etc., and doing it in
more and more places. Automatic speech recognition has
improved tremendously, but still it seems to me that when
people talk to machines they often switch into a different
production mode. At times it may sound
like talking to a (large) dog and sometimes like talking to
a customer service agent in a land far away who is
diligently trying to follow a script rather than listen to
what you are saying. And I wonder whether adjustments that
people make in their speech production when talking with
machines in that mode are in fact optimal
for improving recognition accuracy.
Since machines are not processing speech in the same way
as humans, I wonder if changes in speech production that
make speech more recognizable for other people (or even pets)
are always the same as they are for machines. In other
words, do people tend to make the most optimal adjustments
to make their speech more recognizable to machines. Or is
it more like falling back on clear speech modes that work
with other kinds of listeners (children, nonnative
speakers, pets), or something in between?
I
realize there is a lot to this question, but perhaps people
have started looking into it. I am happy to collate
references and replies and send to the list.
Best,
Valeriy