[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ARO highlights
I'd have to say the shrimp at the 46th St Beachbar and the BT party.
Also, Bob Shannon's demo that you only need 3 or 4 (crude) channels for
basically perfect speech recognition, and the clear implication that the
sophisticated signal-processing capabilities of the auditory system
(including the HYPOTHETICAL cochlear amplifier) have evolved not for speech,
but for music.
Ed Burns
Return-Path: owner-auditory@VM1.MCGILL.CA
Received: by media.mit.edu (5.57/DA1.0.4.amt)
id AA24606; Wed, 16 Feb 94 15:30:43 -0500
Message-Id: <9402162030.AA24606@media.mit.edu>
Received: from VM1.MCGILL.CA by VM1.MCGILL.CA (IBM VM SMTP V2R2)
with BSMTP id 2290; Wed, 16 Feb 94 15:27:33 EST
Received: from VM1.MCGILL.CA by VM1.MCGILL.CA (Mailer R2.10 ptf000) with BSMTP
id 6754; Wed, 16 Feb 94 15:27:30 EST
Date: Wed, 16 Feb 1994 12:28:47 PST
Reply-To: bregman@CCRMA.STANFORD.EDU
Sender: Research in auditory perception <AUDITORY@VM1.MCGILL.CA>
From: bregman@CCRMA.STANFORD.EDU
Subject: How much precision needed?
X-To: auditory@vm1.mcgill.ca
To: Multiple recipients of list AUDITORY <AUDITORY@VM1.MCGILL.CA>
----------
> Also, Bob Shannon's demo that you only need 3 or 4 (crude) channels for
> basically perfect speech recognition, and the clear implication that the
> sophisticated signal-processing capabilities of the auditory system
> (including the HYPOTHETICAL cochlear amplifier) have evolved not for speech,
> but for music.
>
> Ed Burns
I that Ed is joking, but there is a serious point behind his joke.
Was the speech recorded in a background of silence?
I suspect that as soon as you put speech in a background of other
sounds, noise, and reverberation, the "sophisticated signal-processing
capabilities of the auditory system" will be necessary to achieve good
performance. The high-level performance with a informationally-reduced
signal in a simple background is probably due to the redundancy of speech
and the listener's understanding of the constraints imposed by the
articulatory process and by the structure of the language and the
meaning of the message. In short, the listener can profit from the
many sources of redundancy in the signal.
I suspect that you could recognize a piece of music through 3 or 4
channels too, if it was as familiar to you as the words and phrases
of your language are. You wouldn't get the timbral nuances of the
music, and you would have to fill in some aspects of the music from
memory, but this is probably what happens in the speech case too.
Similar phenomena occur in vision. We can recognize simple pen
drawings (sometimes simplified to the level of cartoons) of people
and objects that we are familiar with. We can get depth from a few
lines in a drawing. Then why do we need the "sophisticated signal-
processing capabilities" of the VISUAL system?
Al Bregman
From owner-auditory@VM1.MCGILL.CA Wed Feb 16 15:44:12 1994
Received: by media.mit.edu (5.57/DA1.0.4.amt)
id AA26564; Wed, 16 Feb 94 15:44:12 -0500
Message-Id: <9402162044.AA26564@media.mit.edu>
Received: from VM1.MCGILL.CA by VM1.MCGILL.CA (IBM VM SMTP V2R2)
with BSMTP id 2498; Wed, 16 Feb 94 15:42:11 EST
Received: from VM1.MCGILL.CA by VM1.MCGILL.CA (Mailer R2.10 ptf000) with BSMTP
id 7354; Wed, 16 Feb 94 15:42:08 EST
Date: Wed, 16 Feb 1994 12:39:31 -0800
Reply-To: John Lazzaro <lazzaro@CS.BERKELEY.EDU>
Sender: Research in auditory perception <AUDITORY@VM1.MCGILL.CA>
From: John Lazzaro <lazzaro@CS.BERKELEY.EDU>
Subject: Re: How much precision needed?
X-To: AUDITORY@VM1.MCGILL.CA
To: Multiple recipients of list AUDITORY <AUDITORY@VM1.MCGILL.CA>
> [ bregman@CCRMA.STANFORD.EDU ]
> I suspect that you could recognize a piece of music through 3 or 4
> channels too, if it was as familiar to you as the words and phrases
> of your language are.
As apartment-dwellers of the world can verify! I've become an expert
in recognizing heavy-metal songs by the signal energy present under
120Hz since our new neighbors moved in last year :-)