[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Blind Source Separation by Sparse Decomposition
Dear List,
Following is a note regarding Al Bregman's 9/6/99 post on "blind source
separation by sparse decomposition." To refresh memories on the subject I
have included his note.
I am intrigued by the implications of his questions on multiple sources
and clipped waveforms. They suggest that he might be looking at a
"granular" approach to CASA. Since I've been working with these aspects of
auditory perception for many years, I feel that I can address some of his
questions.
With respect to the separation of signals, one must consider the
operational tradeoffs of the biological ear for its intended purpose as a
sensor for biological survival. Conventional methods do not do this. They
treat the ear as a communications channel, that is inherently ignorant of
the meaning contained in the data it carries. In contrast, the ear as a
sensor searches for meaning rather than, say, low distortion. This search
for meaning seems consistent with Al's remark about "humans trading off
perfection in a narrow set of circumstances for flexibility." Thus, the
solution to sorting multiple sources is to be found not so much in a
mathematical formula as in an engineering design approach that considers
tradeoffs of constraints and objectives. Such an approach could reach the
putative goal of CASA, a signal processing method that can do what the ear
does. I have found that granular time-domain waveform analysis can get
these results efficiently, while shunning the method of biophysical modeling.
Dealing with waveform clipping is the to key high-resolution granular
signal processing. I first looked at this in connection with the
intelligibility of infinitely clipped speech (Licklider & Pollack [1]).
Typically, in voiced speech the upper formants form ripples that ride upon
the wave of a strong first formant. This is especially noticeable in the
/ee/ waveform. These ripples contain zeros of the waveform in the complex
time domain (Voelcker [2]) that are destroyed by clipping.
Now, if destroying ripples removes upper formants, how is it that
clipped speech can be intelligible? The answer is that in the region of the
waveform near the zero axis a few higher frequency zeros can remain. In
fact, enough zeros remain to retain better than 65 percent intelligibility
[1]. In addition, if the waveform is differentiated before it is clipped,
most of the complex zeros are converted to real zeros, giving nearly
perfect intelligibility. Thus, despite severe waveshape distortion, the
meaning is preserved.
More generally, Voelcker has shown that almost all information is found
in the real and complex zeros of a waveform. Thus, information from
overlapping signal sources is contained in the mix of real and complex
zeros that are defined by the clipped waveform. This can be appreciated by
listening to a multi-signal waveform that has been differentiated and
clipped. The real problem here is to devise an algorithm that deconstructs
the clipped waveform and sorts out its mixed zeros into their respective
sources. As an example, my paper on the Haas effect presented at Mohonk97
described a method for doing this using direction of arrival to separate
sources from their reverberations. In various experiments what I have found
is that a granular algorithm using real and complex zeros can replicate
many crucial psychoacoustic and speech processing functions.
To summarize: Voelcker has shown that signal additivity is not
necessarily destroyed by clipping. However, the mathematics of the
separation problem seem to point toward something like fractal theory.
Meanwhile, heuristic methods such as the one I have been using can lead to
practical applications.
References:
[1] J.C.R. Licklider and I. Pollack, "Effects of differentiation,
integration, and infinite clipping upon the intelligibility of speech," J.
Acous. Soc. Am., Vol. 20, pp42-51, January 1948
[2] H.B. Voelker, "Toward a unified theory of modulation," Part I, Phase
envelope relationships, Proc. IEEE, Vol. 63, pp 340-353, March 1966, and
Part II, "Zero manipulation," pp735-755, May 1966
-John Bates
Time/Space Systems
79 Sarles Lane
Pleasantville, NY 10570
914-747-3143
jkbates@ieee.org
----------------------------------------------------------------
At 02:08 PM 9/6/99 -0400, you wrote:
>Dear Michael,
>
>Thanks for your response about the number of receivers versus the number
>of sources It makes the human ability to (imperfectly) deal with many
>sources with only 2 ears even more intriguing. Somehow humans are trading
>off perfection in a narrow set of circumstances for flexibility. I
>suspect _heuristic_ approaches to CASA (computational auditory scene
>analysis) would work more like people do.
>
>Here is why I asked about the clipping problem. I'm no physicist so I
>can't give you an exact physical formulation of the problem. However, it
>seems to me that clipping destroys the linear additivity of the frequency
>components in the signal. Here is a simple example: mix a low amplitude
>high frequency component with a high amplitude, low frequency one. In the
>waveform, the high frequency seems to be riding on top of the low
>frequency at all points in the signal. Now clip the signal. Now the high
>frequency signal is missing in the segments that exceed the clipping
>threshold. It could have changed in frequency (and then back again) for
>all we know.
>
>I wanted to know whether, by destroying the additivity of the signals,
>clipping ruled out any mathematical methods for separation that are based
>on this additivity. I'm also not sure what echos and reverberation would
>do to such mathematical methods.
>
>- Al
>-----------------------------------------------