[AUDITORY] How we talk to machines that listen

Subject: [AUDITORY] How we talk to machines that listen

From: Valeriy Shafiro <firosha@xxxxxxxxx>

Date: Sun, 3 Feb 2019 11:16:35 -0600

Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :mime-version:approved-by; bh=lpX52AILGKzejky8klVhepSFjn7tcK8x0zJD4AFWxQc=; b=rSfuE+Bu8hPu5DwaKCUoR0vnYrQaz2sM962FuNf6KPICK+0I7unTVQDx3D47FpuIP9 O7PAU7WygRMLzpQnUL0Zq3Uglq4XqX8WLaVenXft0Z5EfDjERA/epjNV5DW/wdMv4Ea5 dcSQwn0pYxErIq8M0QYQr4Kx3WTFSOkIcXh1ltkcMIi8OPeLOf/cHmDTCEfes2LueW4S JTqRJZ4hTuVsvTNcBTfgkUVNzenr11Dp+nLzUlnRYMDo2z4Ieeg6L0/XvT4w7aQ2aphJ cFVw2dSXitHJUC+nDUYg13EDCt6S7e/Zp84enyJmVHYK8mBVPzvwi3ajsMYFiK1d1SJs 5mfA==

Arc-seal: i=1; a=rsa-sha256; t=1549257135; cv=none; d=google.com; s=arc-20160816; b=M9YIwr902N8wsgYlbu9XNwsytsWjAYmPm6jMJJqTpQcxIOxK3D51Soi8mvqwCbDgxH UhUXPSSy+Z/YTaeGfokms9dD/MMT4q6etnPzuzjvR61wesjv6yPnLs64uAgI4hMLoa8x xQM9HCHRmne4V/SwfRThxN+B3WTCqax0hchygBEgEkvPamnCuz4nRmYBUpJmx9bZx1dF +uL648yiaCmLla3uNRempeyJApaKOmlCJb5uzcD8+YOU0BX6nJJ/U04mGcMosyyWvyir CE7xhI1QF5QCSxyCge1OBVe1xYsXoXBkTTFliW5uwAfnjehVxHS7+3Z9ZNTKXKqqDHeO uR/Q==

Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Delivered-to: dan.ellis@xxxxxxxxx

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

Reply-to: Valeriy Shafiro <firosha@xxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear list,

I am wondering if any one has any references or suggestions about this question. These days I hear more and more people talking to machines, e.g. Siri, Google, Alexa, etc., and doing it in more and more places. Automatic speech recognition has improved tremendously, but still it seems to me that when people talk to machines they often switch into a different production mode. At times it may sound like talking to a (large) dog and sometimes like talking to a customer service agent in a land far away who is diligently trying to follow a script rather than listen to what you are saying. And I wonder whether adjustments that people make in their speech production when talking with machines in that mode are in fact optimal for improving recognition accuracy. Since machines are not processing speech in the same way as humans, I wonder if changes in speech production that make speech more recognizable for other people (or even pets) are always the same as they are for machines. In other words, do people tend to make the most optimal adjustments to make their speech more recognizable to machines. Or is it more like falling back on clear speech modes that work with other kinds of listeners (children, nonnative speakers, pets), or something in between?

I realize there is a lot to this question, but perhaps people have started looking into it. I am happy to collate references and replies and send to the list.

Best,

Valeriy