Subject: Re: CASA problems and solutions From: Pierre Divenyi <pdivenyi(at)MARVA4.NCSC.MED.VA.GOV> Date: Wed, 31 Jan 2001 12:32:19 -0800Dear John, John, Al, DeLiang and others, To put my 10-cents worth of wisdom into the discussion of CASA vs. ASA, with due respect to DeLiang I must say that the most striking difference between ASA performed by humans and CASA is that the former works and the latter, with few notable exceptions, does not (or at least not well enough). Therefore, I venture to say that CASA engineers have probably nothing to lose if their models try to emulate what a human listener does -- as some of the models actually attempt to do, such as Okuno's "agents" that monitor a suspected source. From my point of view, the real trouble is that we, the humanoidally challenged, are still far from understanding exactly how people accomplish ASA, i.e., the list of tasks that Al Bregman enumerates. One particular function that, to the disappointment of many (including myself), we are now relegating to second rank is localization-based segregation. To the studies showing the relative inefficiency of spatial source separation listed by John Culling, I would like to append ours soon to appear in the book of the Mierlo proceedings. The problem of spatial separation, in my view, is not that the auditory system is a bad localizer but that the location it signals is so prone to adaptation, as the Franssen effect demonstrates and as Rachel Clifton has shown in her precedence effect demonstration. There is nothing as difficult to localize as simultaneous ongoing steady-state or quasi-steady-state signals: is not the placement of instruments the last thing you are concerned about when you hear a symphony concert? The same should be also true for a cocktail party where the bulk of acoustic energy is carried by vowels -- this is the reason that the babble is often characterized as a "buzz". Can one identify the location of individual bees around a hive? What the localization system is exquisitely sensitive to are two parameters: the very first onset of a signal and a change in source location (as Erv Hafter's studies have shown), neither of which are prominent in a cocktail-party. Nevertheless, even if source localization does seem to be a disappointingly poor segregation factor, John Bates's model may be closer than he thinks to what the auditory system may be doing. Actually, it appears that a strict Helmholtzian view of the ear is at best incomplete because there appears to exist a parallel broadband analysis without which Neal Viemeister's now classic observations depicting TMTF could not have occurred. Thus, John's idea that the ear should first register that something has happened and when (which inevitably gives an interaural time difference reading) and do the frequency analysis later is not inconsistent with current auditory theory. Bipartisanly yours (just to be politically up-to-date), Pierre **************************************************************************** Pierre Divenyi, Ph.D. Speech and Hearing Research (151) V.A. Medical Center, Martinez, CA 94553, USA Phone: (925) 370-6745 Fax: (925) 228-5738 E-mail : pdivenyi(at)marva4.ebire.org ****************************************************************************