[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sound Segregation



Dear collegue,

 Concerning the following question of Bernhard Feiten about algorithms of sound
segregation...
>> I'm searching for literature on algorithms of sound segregation.
Tried somebody to use neural nets ?.

  An approach to this problem has been developed at LIMSI since 1983, which
does not participate in the main stream of "Neural"  Networks, but which has
been inspired by neurobiological data concerning learning, memory and the
Peripheral Auditory System.  The main idea is to represent speech items by
characteristic space-time locations. These memory locations may be reached by
an internal flow of activity which propagates spontaneously towards all the
existing characteristic locations, using memory pathways. A given speech signal
is transformed into a spectral distribution of discrete events, which feed
in parallel the pathways, so as to guide propagation along one of them. If no
pathway gets activated, then a new one is created in the course of processing.
The main features of the so-called "Guided Propagation" approach are thus :
  - Recognition through  space-time coincidence detection between internal and
incoming flows of events ;
  - Unsupervised and continuous learning through the sprouting and reinforcement
of memory pathways ;
  - discrete frequency-time representation of speech, based on spectral onsets
and offsets.

  Recognition of superimposed digits has been carried out in 1987 (reported
notably in the 1st IJCNN, San Diego : "Guided Propagation inside a topographic
memory"). The 20 % average difference with a classical DTW approach for this
particular task has recently been confirmed, using a larger corpus ("Speech
Recognition in Adverse Conditions using Guided Propagation networks",
submitted to a Special Issue of the IEEE Transaction on Speech and Audio).

  Guided Propagation Networks are developed in the framework of human-machine
interaction. They are investigated for character recognition, syntactic parsing,
perception/action interaction and multi-modal dialogue. Concerning speech
processing, continuous speech recognition is currently adressed, with the
simulation of the "Cocktail Party Effect" in mind.

  Don't hesitate to contact me if you are interested in knowing more
about this work.
                                Sincerely Yours,

                                        Dominique Beroule
                                        LIMSI-CNRS  B.P.133
                                        91403   ORSAY-cedex
                                             FRANCE

                                        beroule@limsi.fr