Re: Computational ASA (Paris Smaragdis )


Subject: Re: Computational ASA
From:    Paris Smaragdis  <paris(at)MEDIA.MIT.EDU>
Date:    Fri, 30 Apr 2004 18:11:42 -0400

To add one more thing to Rob's excellent points, part of the difficulty in source separation is that it is more of an art than a science. There are no meaningful measures of success, and what is perceived as a good result is whatever sounds good. Therefore by hand tuning it is easy to come up with very good results on a case by case basis (hence the art); but doing so automatically and consistently with arbitrary inputs is very hard because we have no idea what numbers to strive for. Paris Maher, Rob wrote: > Jon-- > I think the inherent difficulty of computational source separation has to do > with the generally ill-posed nature of the research problem: given a > composite observation vector 'A' that is a linear sum of N unknown > time-varying signal vectors 'B', 'C', ..., determine estimates of 'B', 'C', > .... In other words, one equation in N unknowns, where N > 1. Without some > other valid source of information, there can be no unique solution to the > problem. > > To obtain the "other valid source of information," the CASA field has a > variety of threads. One thread involves the use of conventional DSP > techniques to transform the composite signal into a (typically) > time-frequency representation, then to perform pattern extraction in the > transform domain. Another thread uses biologically-inspired signal > processing via cochlear models and perceptually-derived nonlinear functions > borrowed from the perceptual audio coding field. Yet another thread starts > with human psychoacoustical data in an attempt to exploit the cognitive > concepts of source segregation and streaming. > > It is sometimes argued that "humans can do separation, so the problem must > be soluble." I would argue that humans do source _identification and > tracking_ very effectively, but perhaps humans do not actually solve the > computational _separation_ problem, in the sense that the individual vectors > 'B', 'C', etc. are extracted in a neural signal processing context. > > A computational system that is able reliably to classify the number, > identity, and duration of overlapping sonic events seems like a first step > in the process. Yet, I don't know of any system to date that comes close to > a casual human's ability to determine the orchestration of a musical > selection or recognize the doorbell at a noisy party. > > We certainly need so new insights into the problem, so welcome aboard! > > Rob Maher > > -- > Robert C. (Rob) Maher, Ph.D. > Associate Professor of Electrical and Computer Engineering > Montana State University-Bozeman > rob.maher(at)montana.edu > > > >>-----Original Message----- >>From: Jon Boley [mailto:jdb(at)jboley.com] >>Sent: Friday, April 30, 2004 7:59 AM >>To: AUDITORY(at)LISTS.MCGILL.CA >>Subject: Computational ASA >> >> >>Hi all, >>I am a grad student in the University of Miami's Music >>Engineering program, and I am just starting to learn about >>auditory scene analysis, particularly computational ASA models. >> >>I know there are several CASA experts on this list, so I'd >>like to ask why source separation seems to be so difficult. >>It's seems like the general consensus is that source >>separation is far too difficult, and research has focused on >>understanding features within a mix. Yet, from what I've >>read, current methods of feature extraction work quite well. >>It only seems natural that we could write an algorithm that >>groups these features according to their perceived source and >>creates separate audio streams based on this information. >>While this would be much more difficult in noisy or >>reverberant environments, I would imagine it would be quite >>simple in a less complex environment. >>What is it that makes source separation so difficult? >> >>Thanks, >>Jon Boley >> > >


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University