Re: [AUDITORY] How we talk to machines that listen

Subject: Re: [AUDITORY] How we talk to machines that listen

From: Sarah Hargus Ferguson <sarah.ferguson@xxxxxxxxxxxx>

Date: Mon, 4 Feb 2019 19:38:51 +0000

Accept-language: en-US

Approved-by: sarah.ferguson@xxxxxxxxxxxx

Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hsc.utah.edu

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:mime-version:content-language:accept-language :references:thread-index:thread-topic:approved-by; bh=TE5GstJmra2XniG7zDN92ikvU54NEBU6AweIbTak0AI=; b=SJO+vBUjS1a91tLl3yA8sEpKxcwVGRCWletobB0k+YkA2CmKzPRYHY0WcZ1IB8jqsp cMJuS6trQdBjm3M+anT5tKs1aZ4SvTLOnBpbe/DA799LuDz6E0h7p1guWGL9iB3mbVdY OpuSDfmsunPpysOWsKJjv9C0H50f/FTmDvcyHdOQZcN72qFxJQ4Sy5SJVBKOrvqbNGxc 4MDPx5PQtsKUTYn/OPEiBxLtTVbSyS3yJfshsDS4OCovZra87keLMxnM2ajEyn2GC+4Z HsfA2Vg6deOi5OsUtYJYw5TOphGKk1GOgLrvAbxOfI3WjKOBBZ7neUGG5DnaBwuthO6y /ynw==

Arc-seal: i=1; a=rsa-sha256; t=1549343381; cv=none; d=google.com; s=arc-20160816; b=LoejeI3Bt4sn7FFr/GmCbiwSb3xk65iFcFoVEYa6qHVs+XxgHAgsGyVIZemZGc6Cte 6uVzhePu4hgzC2W5Qkkty6wg3Cb/E6C/YGIo50wAnCeRYBYkdi8HpGb/VLVDNPVFWBtL B2mFDXueafKBi4YznQZz+4RMHWGqQrDNMMVM/rZ7YndEPg+oh6lmNHmLCPoKn8rLk605 O0R65czUoaLEmEKEc4sEdag35jgn8ub/NNuKBv3SIZwuEDs4qayFY9d+bvy5vsxTyAxF 9PCFqajiJTdf0/c3v4Vdf5mD0Y+AVuBLvbGtxlDftrZ0xcHT2KtAVqjEWk5zenfMQcvf g70Q==

Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hsc.utah.edu

Comments: To: Valeriy Shafiro <firosha@xxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

In-reply-to: <10365_1549257137_5C57C9B1_10365_208_1_CAP907Y1+armH8BJ4EEjTJMLngeBo+=tUwYtUM2m1120wWU7HTA@mail.gmail.com>

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <10365_1549257137_5C57C9B1_10365_208_1_CAP907Y1+armH8BJ4EEjTJMLngeBo+=tUwYtUM2m1120wWU7HTA@mail.gmail.com>

Reply-to: Sarah Hargus Ferguson <sarah.ferguson@xxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Thread-index: AQHUvEhCmF4lom+s/EWrvCIY/kyda6XQBmBA

Thread-topic: [AUDITORY] How we talk to machines that listen

Hello Valeriy!

Sharon Oviatt had a couple of papers looking at something like this (references below) – talkers go into a “hyperspeech” when a computer mis-recognizes what they say.

Oviatt, S., Levow, G. A., Moreton, E., & MacEachern, M. (1998). Modeling global and focal hyperarticulation during human-computer error resolution. Journal of the Acoustical Society of America, 104, 3080-3098. doi:10.1121/1.423888

Oviatt, S., MacEachern, M., & Levow, G. A. (1998). Predicting hyperarticulate speech during human-computer error resolution. Speech Communication, 24, 87-110. doi:10.1016/S0167-6393(98)00005-3

I don’t know if the idea came from these papers or if I heard it somewhere else, but speech recognizers are (or used to be) trained up on citation-style speech, so hyperarticulation should make speech recognition worse. I’ve been surprised by how well Siri does and will sometimes try to “mess with´ it to see how it behaves.

Sarah Hargus Ferguson, PhD, CCC-A

Associate Professor

Department of Communication Sciences and Disorders

u_health_email

From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> On Behalf Of Valeriy Shafiro
Sent: Sunday, February 3, 2019 10:17 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] How we talk to machines that listen

Dear list,

I am wondering if any one has any references or suggestions about this question. These days I hear more and more people talking to machines, e.g. Siri, Google, Alexa, etc., and doing it in more and more places. Automatic speech recognition has improved tremendously, but still it seems to me that when people talk to machines they often switch into a different production mode. At times it may sound like talking to a (large) dog and sometimes like talking to a customer service agent in a land far away who is diligently trying to follow a script rather than listen to what you are saying. And I wonder whether adjustments that people make in their speech production when talking with machines in that mode are in fact optimal for improving recognition accuracy. Since machines are not processing speech in the same way as humans, I wonder if changes in speech production that make speech more recognizable for other people (or even pets) are always the same as they are for machines. In other words, do people tend to make the most optimal adjustments to make their speech more recognizable to machines. Or is it more like falling back on clear speech modes that work with other kinds of listeners (children, nonnative speakers, pets), or something in between?

I realize there is a lot to this question, but perhaps people have started looking into it. I am happy to collate references and replies and send to the list.

Best,

Valeriy