[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Segregation with neural nets



>Hi,
>I'm searching for literature on algorithms of sound segregation.
>Tried somebody to use neural nets ?.
>
>With best regards, Bernhard.

I have a paper coming out in June issue of JASA.  It mentions
neural mechanisms, but nothing that resembles (formal) neural
nets.  Below is the bibliography, containing several major references
on sound separation.  See also Guy Brown's recent thesis.

Alain.
---
Assmann, P. F.,and Summerfield, Q. (1988). "Pitch-pulse asynchrony and the
perceptual segregation of competing voices," Speech 88 conference (7th
FASE), Edinburgh, 531-538.
Assmann, P. F. and Summerfield, Q. (1989). "Modeling the perception of
concurrent vowels: Vowels with the same fundamental frequency," J. Ac. Soc.
Am.85, 327-338.
Assmann, P. F. and Summerfield, Q. (1990). "Modeling the perception of
concurrent vowels: vowels with different fundamental frequencies," J. Ac.
Soc. Am. 88, 680-697.
Bregman, A. S. (1990). Auditory scene analysis (MIT Press, Cambridge,
Mass.), 773 p.
Brokx, J. P. L. and Nooteboom, S. G. (1982). "Intonation and the perceptual
separation of simultaneous voices," Journal of Phonetics 10, 23-36.
Carney, H. and Yin, T. C. T. (1988). "Temporal coding of resonances by
low-frequency auditory nerve fibers: single fiber responses and a
population model," J. Neurophysiol. 60, 1653-1677.
Carney, L. H. and Yin, T. C. T. (1989). "Responses of low-frequency cells
in the inferior colliculus to interaural time differences of clicks:
excitatory and inhibitory components," J. Neurophysiol. 62, 144-161.
Carr, C. E. and Konishi, M. (1990). "A circuit for detection of interaural
time differences in the brain stem of the barn owl," J. Neuroscience 10,
3227-3246.
Chan, J. C. K., Yin, T. C. T. and Musicant, A. D. (1987). "Effects of
interaural time delays of noise stimuli on low-frequency cells in the cat's
inferior colliculus. II. Responses to band-pass filtered noises," J.
Neurophysiol. 58, 543-561.
Cherry, E. C. (1953). "Some experiments on the recognition of speech with
one, and with two ears," J. Ac. Soc. Am. 25, 975-979.
de Cheveigne, A. (1986). "A pitch perception model," Proc. IEEE ICASSP, 897-900.
de Cheveigne, A. (1990). "F0 estimation from mixed speech," ATR Auditory
and visual perception research labs technical report TR-A-0097.
de Cheveigne, A. (1991). "A mixed speech F0 estimation algorithm," Proc.
ESCA (Eurospeech), Genova, 445-448.
Childers, D. G. and Lee, C. K. (1987). "Co-channel speech separation,"
Proc. IEEE ICASSP, 181-184.
Colburn, H. S. and Durlach, N. I. (1978). "Models of binaural interaction,"
in Handbook of perception, edited by E. C. Carterette and M. P. Friedman
(Academic Press, New York), 467-518.
Darwin, C. J. and Culling, J. F.  (1990). "Speech perception seen through
the ear," Speech Communication 9, 469-475.
Delgutte, B. (1984). "Speech coding in the auditory nerve: II. Processing
schemes for vowel-like sounds," JASA 75, 879-886.
Duifhuis, H., Willems, L. F. and Sluyter, R. J. (1982). "Measurement of
pitch in speech: an implementation of Goldstein's theory of pitch
perception," J. Ac. Soc. Am. 1568-1580.
Durlach, N. I. (1963). "Equalization and cancellation theory of binaural
masking-level differences," J. Ac. Soc. Am. 35, 1206-1218.
Evans, E. F. (1983). "Pitch and cochlear nerve fibre temporal discharge
patterns," in Hearing-Physiological bases and psychophysics, edited by R.
Klinke & R. Hartmann, (Springer-Verlag, Berlin), 140-146
Fletcher (1929). Speech and Hearing (van Norstrand, New York).
Frazier, R. H., Samsam, S., Braida, L. D. and Oppenheim, A. V. (1976).
"Enhancement of speech by adaptive filtering," Proc. IEEE ICASSP, 251-253.
Goldstein, J. L. and Srulovicz, P. (1977). "Auditory-nerve spike intervals
as an adequate basis for aural frequency measurement," in Psychophysics and
physiology of hearing, edited by E. F. Evans and J. P. Wilson (Academic
Press, London), 337-347.
Hanson, B. A. and Wong, D. Y. (1984). "The harmonic magnitude suppression
(HMS) technique for intelligibility enhancement in the presence of
interfering noise," IEEE ICASSP, 2, pp. 18A.5.1-4.
Hess, W. (1983). Pitch determination of speech signals (Springer-Verlag,
Berlin), 698 p.
Holdsworth, J., Nimmo-Smith, I., Patterson,R. D., and Rice, P.  (1988).
"Implementing a GammaTone filter bank," MRC Applied Psychology Unit
technical report.
Horst, J. W., Javel, E. , Farley, G. R. (1986). "Coding of spectral fine
structure in the auditory nerve. I. Fourier analysis of period and
interspike interval interval histograms," J. Ac. Soc. Am. 79, 398-416.
Javel, J. B., Mott,. B., Rush, N. L. and Smith, D. W. (1988). "Frequency
discrimination: evaluation of rate and temporal codes," in Basic issues in
hearing, edited by H. Duifhuis, J. W. Horst and H. P. Wit (Academic Press,
London), 224-234.
Jeffress, L. A. (1948). "A place theory of sound localization," J. Comp.
Physiol. Psychol. 41, 35-39.
Konishi, M., Takahashi, T. T., Wagner, H., Sullivan, W. E. and Carr, C. E.
(1988). "Neurophysiological and anatomical substrates of sound localization
in the owl," in Auditory function - neurobiological bases of hearing,
edited by G. M. Edelman, W. E. Gall and W. M. Cowan (Wiley, New York),
721-745.
Kopec, G. E. and Bush, M. A. (1989). "An LPC-based spectral similarity
measure for speech recognition in the presence of co-channel speech
interference," Proc. IEEE ICASSP, 270-273.
Kuwabara, H., Sagisaka, Y., Takeda, K. and Abe, M. (1989). "Construction of
ATR Japanese speech database as a research tool," ATR Interpreting
telephony research laboratories technical report TR-I-0086.
Kuwada, S., Yin, T. C. T., Haberly, L. B. and Wickesberg, R. E. (1980).
"Binaural interaction in the cat inferior colliculus: physiology and
anatomy," in Psychophysical, physiological and behavioral studies in
hearing, edited by G. v. d. Brink and F. A. Bilsen (Delft University
Press), 401-411.
Langner, G. (1981). "Neuronal mechanisms for pitch analysis in the time
domain," Exp. Brain Res. 44, 450-454.
Langner, G. and Schreiner, C. E. (1988). "Periodicity coding in the
inferior colliculus of the cat. I. Neuronal mechanisms," J. Neurophysiol.
60, 1799-1822.
Lea, A. (1992). "Auditory models of vowel perception," unpublished doctoral
dissertation, University of Nottingham.
Lea, A. P. and Summerfield, Q. (1992). "Monaural segregation of competing
voices," Proc. ASJ committee on Hearing H-92-31, pp. 1-7.
Licklider, J. C. R. (1956). "Auditory frequency analysis," in Information
theory, edited by C. Cherry (Butterworth, London), 253-268.
Licklider, J. C. R. (1959). "Three auditory theories," in Psychology, a
study of a science, edited by S. Koch (McGraw-Hill), 41-144.
Licklider, J. C. R. (1962). "Periodicity pitch and related auditory process
models," International Audiology 1, 11-36.
Lyon, R. (1984). "Computational models of neural auditory processing,"
Proc. IEEE ICASSP, 36.1.(1-4).
Lyon, R. F. (1983-1989). "A computational model of binaural localization
and separation," Proc. IEEE ICASSP, reproduced in Natural computation,
edited by W. Richards (MIT Press, Cambridge, Mass), 319-327.
McAdams, S. (1989). "Segregation of concurrent sounds. I: Effects of
frequency modulation coherence," J. Ac. Soc. Am. 86, 2148-2159.
McFadden, D. (1973). "Precedence effects and auditory cells with long
characteristic delays," J. Ac. Soc. Am. 54, 528-530.
Meddis, R. and Hewitt, M. (1988). "A computational model of low pitch
judgement," in Basic issues in hearing, edited by H. Duifuis, J. W. Horst
and H. P. Witt (Academic, London), 148-153.
Meddis, R. and Hewitt, M. J. (1991a). "Virtual pitch and phase sensitivity
of a computer model of the auditory periphery. I: pitch identification," J.
Ac. Soc. Am. 89, 2866-2882.
Meddis, R. and Hewitt, M. J. (1991b). "Virtual pitch and phase sensitivity
of a computer model of the auditory periphery. II: phase sensitivity," J.
Ac. Soc. Am. 89, 2883-2894.
Meddis, R. and Hewitt, M. J. (1992). "Modelling the identification of
concurrent vowels with different fundamental frequencies," J. Ac. Soc. Am.
91, 233-245.
Min, K., Chien, D., Li, S. and Jones, C. (1988). "Automated two speaker
separation system," Proc. IEEE ICASSP, 537-540.
Moore, B. C. J. (1982). An introduction to the psychology of hearing
(Academic Press, London).
Moore, B. C. J. and Glasberg, B. R. (1983). "Suggested formulae for
calculating auditory-filter bandwidths and excitation patterns," J. Ac.
Soc. Am. 74, 750-753.
M ller, A. R. (1977a). "Frequency selectivity of single auditory-nerve
fibers in response to broadband noise stimuli," J. Ac. Soc. Am. 62,
135-142.
M ller, A. R. (1977b). "Frequency selectivity of the basilar membrane
revealed from discharges in auditory nerve fibers," in Psychophysics and
physiology of hearing, edited by E. F. Evans, & J. P. Wilson, (Academic
Press, London), 197-207.
Nagabuchi, H., Kobayashi, T. and Yamamoto, H. (1979). "Speech enhancement
and suppression in mixed speech," Transactions of the IECE (Japan) 62,
627-634 (in Japanese).
Naylor, J. A. and Boll, S. F. (1987). "Techniques for suppression of an
interfering talker in co-channel speech," Proc. ICASSP, 205-208.
Palmer, A. R. (1988). "The representation of concurrent vowels in the
temporal discharge patterns of auditory nerve fibers," in Basic issues in
hearing, edited by H. Duifhuis, J. W. Horst and H. P. Wit (Academic Press,
London), 244-251.
Palmer, A. R. (1990). "The representation of the spectra and fundamental
frequencies of steady-state single- and double-vowel sounds in the temporal
discharge patterns of guinea pig cochlear-nerve fibers," J. Acoust. Soc.
Am. 88, 1412-1426.
Palmer, A. R., Rees, A. and Caird, D. (1990). "Interaural delay sensitivity
to tones and broad band signals in the guinea-pig inferior colliculus,"
Hearing Research 50, 71-86.
Palmer, A. R. (1992). "Segregation of the responses to paired vowels in the
auditory nerve of the guinea-pig using autocorrelation," in Audition speech
and language, edited by B. Schouten (Mouton-DeGruyter, Berlin), (in press).
Parsons, T. W. (1976). "Separation of speech from interfering speech by
means of harmonic selection," J. Ac. Soc. Am. 60, 911-918.
Patterson, R. D., Robinson, K., Holdsworth, J., McKeown, D.,  Zhang, C.,
and Allerhand, M. (1992). "Complex sounds and auditory images," in Auditory
physiology and perception, edited by Y. Cazals, L. Demany and K. Horner
(Pergamon, Oxford), 429-446.
de Ribaupierre, F., Rouiller. E., Toros A., and de Ribaupierre. Y. (1980).
"Transmission delay of phase-locked cells in the medial geniculate body,"
Hearing Research 3, 65-77.
Ross, M. J., Shaffer, H. L., Cohen, A., Freudberg, R. and Manley, H. J.
(1974). "Average magnitude difference function pitch extractor," IEEE
Trans. ASSP 22, 353-362.
Ruggero, M. A. (1973). "Response to noise of auditory nerve fibers in the
squirrel monkey," J. Neurophysiol. 36, 569-587.
Sachs, M. B., & Young, E. D. (1979). "Encoding of steady-state vowels in
the auditory nerve: representation in terms of discharge rate", J. Ac. Soc.
Am. 66, 470-479.
Scheffers, M. T. M. (1983). "Sifting vowels," thesis, University of Groningen.
Schreiner, C. E. and Langner, G. (1988a). "Coding of temporal patterns in
the central auditory nervous system," in Auditory function -
Neurobiological bases of hearing, edited by G. M. Edelman, W. E. Gall and
W. M. Cowan (Wiley, New York), 337-361.
Schreiner, C. E. and Langner, G. (1988b). "Periodicity coding in the
inferior colliculus of the cat. II. Topographical organization," J.
Neurophysiol. 60, 1823-1840.
Schroeder, M. R. (1968). "Period histogram and product spectrum: new
methods for fundamental-frequency measurement," J. Ac. Soc. Am. 43,
829-834.
Schubert, E. (1978). "History of research on hearing," in Handbook of
perception, edited by E. C. Carterette and M. P. Friedman (Academic Press,
New York), 41-80.
Silva, F. M. and Almeida, L. B. (1990). "Speech separation by means of
stationary least-squares harmonic estimation," Proc. IEEE ICASSP, 809-812.
Stubbs, R. J. and Summerfield, Q. (1988). "Evaluation of two
voice-separation algorithms using normal-hearing and hearing-impaired
listeners," J. Ac. Soc. Am. 84, 1236-1249.
Stubbs, R. J. and Summerfield, Q. (1990). "Algorithms for separating the
speech of interfering talkers: evaluations with voiced sentences, and
normal-hearing and hearing-impaired listeners," J. Ac. Soc. Am. 87,
359-372.
Stubbs, R. J. and Q. Summerfield. (1991a). "Effects of signal-to-noise
ratio, signal periodicity, and degree of hearing impairment on the
performance of voice-separation algorithms," JASA 89, pp. 1383-1393.
Summerfield, Q. and P. F. Assmann. (1991b). "Perception of concurrent
vowels: effects of harmonic misalignment and pitch-period asynchrony," J.
Ac. Soc. Am. 89, pp. 1364-1377.
van Noorden, L. (1982). "Two channel pitch perception," in Music, mind, and
brain, edited by M. Clynes (Plenum press, London), 251-269.
Weintraub, M. (1985). "A theory and computational model of auditory
monaural sound separation," thesis, Stanford University.
Weintraub, M. (1986). "A computational model for separating two
simultaneous sounds," Proc. IEEE ICASSP, Tokyo, 1, 3.1.1-4.
Wickesberg, R. E. and Oertel, D.  (1990). "Delayed, frequency-specific
inhibition in the cochlear nuclei of mice: a mechanism for monaural echo
suppression," J. Neuroscience 10, pp. 1762-1768.
Yin, T. C. T. and Chan, J. C. K. (1990). "Interaural time sensitivity in
medial superior olive of cat," J. Neurophysiol. 64, 465-488.
Yin, T. C. T., Chan, J. C. K. and Carney, L. H. (1987). "Effects of
interaural time delays of noise stimuli on low-frequency cells in the cat's
inferior colliculus. III. Evidence for cross-correlation," J. Neurophysiol.
58, 562-583.
Young, E. D. and Sachs, M. B. (1979). "Representation of steady-state
vowels in the temporal aspects of the discharge patterns of populations of
auditory-nerve fibers," J. Ac. Soc. Am. 66, 1381-1403.

---



------------------------------------------------------------------
Alain de Cheveigne'                       phone: (33) (1) 44273633
                                            fax: (33) (1) 44277919
e-mail:      alain@linguist.jussieu.fr
NeXT-format: nalain@linguist.jussieu.fr
japanese:    jalain@linguist.jussieu.fr
mail: Laboratoire de Linguistique Formelle, CNRS / Universite'
Paris 7, case 7003, 2 place Jussieu, 75251 Paris CEDEX 05, FRANCE.
------------------------------------------------------------------