Re: [AUDITORY] How we talk to machines that listen

Subject: Re: [AUDITORY] How we talk to machines that listen

From: Phil Green <p.green@xxxxxxxxxxxxxxx>

Date: Tue, 5 Feb 2019 19:54:52 +0000

Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sheffield.ac.uk

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:content-language:mime-version:user-agent:references :approved-by; bh=DsU+KGPmYjRRUQ5HDbJQth+syKE+ycQgcAZpGqFqbX8=; b=CXXOTgPj3oR9DGyS4U0ZfSw0R58k+LTh9NIrsEaYaTcCxsU2qTw0PnWUkdHqOiYJS9 q285T2Vu6/4t7JL8HYgbDFIDRvLonsvMB78z+hDkNmnTF5GE8cyo4FmDtBvhUpvW2BFd /o1DlMSDKycpSTA7qnjitpCtjxvF0IZkRH5HdZhG7ONNk5YT584TYYTFmnUj6OGDWewj J/Wb5hOMY8cEMnGBA1rjyZwlQi6MpHaL9zR+kKrQQo6MAc2DWweLFV+0c47Qibdf24Xa DnkNZ304Qg52wYcIhCfupBLsZpVUCnRK47dw776W9hIRqOZs/prIzKbfpOMgZtW3brom mK7w==

Arc-seal: i=1; a=rsa-sha256; t=1549430170; cv=none; d=google.com; s=arc-20160816; b=BNyRA3rq/oBIg6KgLAUspU9S/lm6UM6X8gbFnaNWHLYREwiMXa1/i5g0hkvY1BEccz 2683Bvsogc6WApxJuBDu+jkGDgJC49n64+f7glbISc9nCrbTCvdA/SfE9/02Vc2CQ96u kqhtLXJYty6a9ZFkrlsHqnt2A6CXifeiGb9cpNNTtlxTsyFBmb+lZQhiF5EveNEasewI Da1hxjifxZsbqiYKYyyd+tbUca0XjXhaVmEQ0BPY/vPNkWb715iJ7ZAtyXCXq+yIswhM 4cXGcbcJp19qXRjKIejfqxqYAg3f9Gp6us+rNzVrtDgIOYTK7Lwb3e5LMGkzyEObbZWs OJ5g==

Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sheffield.ac.uk

Comments: To: Valeriy Shafiro <firosha@xxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

In-reply-to: <10365_1549256982_5C57C916_10365_62_1_CAP907Y1+armH8BJ4EEjTJMLngeBo+=tUwYtUM2m1120wWU7HTA@mail.gmail.com>

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <10365_1549256982_5C57C916_10365_62_1_CAP907Y1+armH8BJ4EEjTJMLngeBo+=tUwYtUM2m1120wWU7HTA@mail.gmail.com>

Reply-to: Phil Green <p.green@xxxxxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

On 03/02/2019 17:16, Valeriy Shafiro wrote:

Dear list,

I am wondering if any one has any references or suggestions about this question. These days I hear more and more people talking to machines, e.g. Siri, Google, Alexa, etc., and doing it in more and more places. Automatic speech recognition has improved tremendously, but still it seems to me that when people talk to machines they often switch into a different production mode. At times it may sound like talking to a (large) dog and sometimes like talking to a customer service agent in a land far away who is diligently trying to follow a script rather than listen to what you are saying. And I wonder whether adjustments that people make in their speech production when talking with machines in that mode are in fact optimal for improving recognition accuracy. Since machines are not processing speech in the same way as humans, I wonder if changes in speech production that make speech more recognizable for other people (or even pets) are always the same as they are for machines. In other words, do people tend to make the most optimal adjustments to make their speech more recognizable to machines. Or is it more like falling back on clear speech modes that work with other kinds of listeners (children, nonnative speakers, pets), or something in between?

I realize there is a lot to this question, but perhaps people have started looking into it. I am happy to collate references and replies and send to the list.

Best,

Valeriy

Here's a forthcoming workshop on this topic..

http://speech-interaction.org/chi2019/

phil

-- 
*** note email is now p.green@xxxxxxxxxx ***
Professor Phil Green
SPandH
Dept of Computer Science
University of Sheffield
*** note email is now p.green@xxxxxxxxxx ***