Sound Segregation (Dominique Beroule )


Subject: Sound Segregation
From:    Dominique Beroule  <DOMI%FRLIM51.BITNET(at)VM1.MCGILL.CA>
Date:    Mon, 26 Apr 1993 17:07:18 +0200

Dear collegue, Concerning the following question of Bernhard Feiten about algorithms of sound segregation... >> I'm searching for literature on algorithms of sound segregation. Tried somebody to use neural nets ?. An approach to this problem has been developed at LIMSI since 1983, which does not participate in the main stream of "Neural" Networks, but which has been inspired by neurobiological data concerning learning, memory and the Peripheral Auditory System. The main idea is to represent speech items by characteristic space-time locations. These memory locations may be reached by an internal flow of activity which propagates spontaneously towards all the existing characteristic locations, using memory pathways. A given speech signal is transformed into a spectral distribution of discrete events, which feed in parallel the pathways, so as to guide propagation along one of them. If no pathway gets activated, then a new one is created in the course of processing. The main features of the so-called "Guided Propagation" approach are thus : - Recognition through space-time coincidence detection between internal and incoming flows of events ; - Unsupervised and continuous learning through the sprouting and reinforcement of memory pathways ; - discrete frequency-time representation of speech, based on spectral onsets and offsets. Recognition of superimposed digits has been carried out in 1987 (reported notably in the 1st IJCNN, San Diego : "Guided Propagation inside a topographic memory"). The 20 % average difference with a classical DTW approach for this particular task has recently been confirmed, using a larger corpus ("Speech Recognition in Adverse Conditions using Guided Propagation networks", submitted to a Special Issue of the IEEE Transaction on Speech and Audio). Guided Propagation Networks are developed in the framework of human-machine interaction. They are investigated for character recognition, syntactic parsing, perception/action interaction and multi-modal dialogue. Concerning speech processing, continuous speech recognition is currently adressed, with the simulation of the "Cocktail Party Effect" in mind. Don't hesitate to contact me if you are interested in knowing more about this work. Sincerely Yours, Dominique Beroule LIMSI-CNRS B.P.133 91403 ORSAY-cedex FRANCE beroule(at)limsi.fr


This message came from the mail archive
http://www.auditory.org/postings/1993/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University