Thesis announcement ("Martin, Keith" )


Subject: Thesis announcement
From:    "Martin, Keith"  <Keith_Martin(at)BOSE.COM>
Date:    Tue, 14 Mar 2000 13:03:42 -0500

This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01BF8DDF.A6244904 Content-Type: text/plain; charset="iso-8859-1" I don't quite know why it has taken me so long to get around to posting this, but my dissertation (from June, 1999) may be of interest to some of the members of this list. A one sentence description of the work (as relevant to this list) is "a computational model of musical-instrument recognition within a computational auditory scene analysis framework". Abstract, TOC, and URL below. I hope some of you find it useful and/or interesting. Cheers, --Keith ----- Keith D. Martin Bose Corporation (formerly of the MIT Media Lab) The Mountain (R&D - 15C) Framingham, MA 01701-9168 USA Relevant details: Martin, Keith D. (1999) Sound-Source Recognition: A Theory and Computational Model. Ph.D. Thesis. Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science. You can download a copy of the document here: ftp://sound.media.mit.edu/pub/Papers/kdm-phdthesis.pdf (2.3 MB). Abstract The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using both isolated musical tones and excerpts from compact disc recordings as test stimuli. The computer model's performance is robust with regard to the variations of reverberation and ambient noise (although not with regard to competing sound sources) in commercial compact disc recordings, and the system performs better than three out of fourteen skilled human listeners on a forced-choice classification task. This work has implications for research in musical timbre, automatic media annotation, human talker identification, and computational auditory scene analysis. Table of Contents 1 Introduction 1.1 Motivation and approach 1.2 A theory of sound-source recognition 1.3 Applications 1.4 Overview and scope 2 Recognizing sound sources 2.1 Understanding auditory scenes 2.1.1 Exploiting environmental constraints 2.1.2 The importance of knowledge 2.1.3 Computational auditory scene analysis 2.2 Evaluating sound-source recognition systems 2.3 Human sound-source recognition 2.4 Machine sound-source recognition 2.4.1 Recognition within micro-domains 2.4.2 Recognition of broad sound classes 2.4.3 Recognition of human talkers 2.4.4 Recognition of environmental sounds 2.4.5 Recognition of musical instruments 2.5 Conclusions and challenges for the future 3 Recognizing musical instruments 3.1 Human recognition abilities 3.2 Musical instrument sound: acoustics and perception 3.2.1 An aside on "timbre" 3.2.2 The magnitude spectrum 3.2.3 The dimensions of sound 3.2.4 Resonances 3.3 Instrument families 3.3.1 The brass instruments 3.3.2 The string instruments 3.3.3 The woodwind instruments 3.4 Summary 4 Representation 4.1 Overview 4.1.1 Mid-level representation 4.1.2 Features and classification 4.2 The front end 4.2.1 Bandpass filterbank 4.2.2 Inner hair cell transduction 4.2.3 Pitch analysis 4.3 The weft 4.4 Note properties / source models 4.4.1 Spectral features 4.4.2 Pitch, vibrato, and tremolo features 4.4.3 Attack transient properties 4.5 The model hierarchy 5 Recognition 5.1 Overview and goals 5.2 Definitions and basic principles 5.3 Taxonomic classification 5.3.1 Extension #1: Context-dependent feature selection 5.3.2 Extension #2: Rule-one-out 5.3.3 Extension #3: Beam search 5.4 Strengths of the approach 5.5 An example of the recognition process 6 Evaluation 6.1 A database of solo orchestral instrument recordings 6.2 Testing human abilities 6.2.1 Experimental method 6.2.2 Results 6.2.3 Discussion 6.3 Computer experiment #1: Isolated tone pilot study 6.4 Computer experiment #2: 6- to 8-way classification 6.5 Computer experiment #3: Direct comparison to human abilities 6.6 General discussion 7 Summary and conclusions 7.1 Summary 7.2 Future developments 7.3 Insights gained 7.4 Conclusions References ------_=_NextPart_001_01BF8DDF.A6244904 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Diso-8859-1"> <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version = 5.0.1461.43"> <TITLE>Thesis announcement</TITLE> </HEAD> <BODY> <BR> <P><FONT SIZE=3D2>I don't quite know why it has taken me so long to get = around to posting this, but my dissertation (from June, 1999) may be of = interest to some of the members of this list. A one sentence = description of the work (as relevant to this list) is &quot;a = computational model of musical-instrument recognition within a = computational auditory scene analysis framework&quot;. Abstract, TOC, = and URL below. I hope some of you find it useful and/or = interesting.</FONT></P> <P><FONT SIZE=3D2>Cheers,</FONT> </P> <P><FONT SIZE=3D2>--Keith</FONT> </P> <P><FONT SIZE=3D2>-----</FONT> <BR><FONT SIZE=3D2>Keith D. Martin</FONT> <BR><FONT SIZE=3D2>Bose Corporation </FONT> <BR><FONT SIZE=3D2>(formerly of the MIT Media Lab)</FONT> <BR><FONT SIZE=3D2>The Mountain (R&amp;D - 15C)</FONT> <BR><FONT SIZE=3D2>Framingham, MA</FONT> <BR><FONT SIZE=3D2>01701-9168 USA</FONT> </P> <P><FONT SIZE=3D2>Relevant details:</FONT> </P> <P><FONT SIZE=3D2>Martin, Keith D. (1999) Sound-Source Recognition: A = Theory and Computational Model. Ph.D. Thesis. Massachusetts Institute = of Technology Department of Electrical Engineering and Computer = Science.</FONT></P> <P><FONT SIZE=3D2>You can download a copy of the document here: <A = HREF=3D"ftp://sound.media.mit.edu/pub/Papers/kdm-phdthesis.pdf" = TARGET=3D"_blank">ftp://sound.media.mit.edu/pub/Papers/kdm-phdthesis.pdf= </A> (2.3 MB). </FONT> </P> <P><FONT SIZE=3D2>Abstract</FONT> </P> <P><FONT SIZE=3D2>The ability of a normal human listener to recognize = objects in the environment from only the sounds they produce is = extraordinarily robust with regard to characteristics of the acoustic = environment and of other competing sound sources. In contrast, computer = systems designed to recognize sound sources function precariously, = breaking down whenever the target sound is degraded by reverberation, = noise, or competing sounds. Robust listening requires extensive = contextual knowledge, but the potential contribution of sound-source = recognition to the process of auditory scene analysis has largely been = neglected by researchers building computational models of the scene = analysis process. </FONT></P> <P><FONT SIZE=3D2>This thesis proposes a theory of sound-source = recognition, casting recognition as a process of gathering information = to enable the listener to make inferences about objects in the = environment or to predict their behavior. In order to explore the = process, attention is restricted to isolated sounds produced by a small = class of sound sources, the non-percussive orchestral musical = instruments. Previous research on the perception and production of = orchestral instrument sounds is reviewed from a vantage point based on = the excitation and resonance structure of the sound-production process, = revealing a set of perceptually salient acoustic features. </FONT></P> <P><FONT SIZE=3D2>A computer model of the recognition process is = developed that is capable of &quot;listening&quot; to a recording of a = musical instrument and classifying the instrument as one of 25 = possibilities. The model is based on current models of signal = processing in the human auditory system. It explicitly extracts salient = acoustic features and uses a novel improvisational taxonomic = architecture (based on simple statistical pattern-recognition = techniques) to classify the sound source. The performance of the model = is compared directly to that of skilled human listeners, using both = isolated musical tones and excerpts from compact disc recordings as = test stimuli. The computer model's performance is robust with regard to = the variations of reverberation and ambient noise (although not with = regard to competing sound sources) in commercial compact disc = recordings, and the system performs better than three out of fourteen = skilled human listeners on a forced-choice classification task. = </FONT></P> <P><FONT SIZE=3D2>This work has implications for research in musical = timbre, automatic media annotation, human talker identification, and = computational auditory scene analysis. </FONT></P> <P><FONT SIZE=3D2>Table of Contents</FONT> </P> <P><FONT SIZE=3D2>1 Introduction </FONT> <BR><FONT SIZE=3D2>1.1 Motivation and approach </FONT> <BR><FONT SIZE=3D2>1.2 A theory of sound-source recognition </FONT> <BR><FONT SIZE=3D2>1.3 Applications </FONT> <BR><FONT SIZE=3D2>1.4 Overview and scope </FONT> </P> <P><FONT SIZE=3D2>2 Recognizing sound sources </FONT> <BR><FONT SIZE=3D2>2.1 Understanding auditory scenes </FONT> <BR><FONT SIZE=3D2>2.1.1 Exploiting environmental constraints </FONT> <BR><FONT SIZE=3D2>2.1.2 The importance of knowledge </FONT> <BR><FONT SIZE=3D2>2.1.3 Computational auditory scene analysis</FONT> <BR><FONT SIZE=3D2>2.2 Evaluating sound-source recognition = systems</FONT> <BR><FONT SIZE=3D2>2.3 Human sound-source recognition</FONT> <BR><FONT SIZE=3D2>2.4 Machine sound-source recognition</FONT> <BR><FONT SIZE=3D2>2.4.1 Recognition within micro-domains</FONT> <BR><FONT SIZE=3D2>2.4.2 Recognition of broad sound classes</FONT> <BR><FONT SIZE=3D2>2.4.3 Recognition of human talkers </FONT> <BR><FONT SIZE=3D2>2.4.4 Recognition of environmental sounds </FONT> <BR><FONT SIZE=3D2>2.4.5 Recognition of musical instruments</FONT> <BR><FONT SIZE=3D2>2.5 Conclusions and challenges for the future</FONT> </P> <P><FONT SIZE=3D2>3 Recognizing musical instruments</FONT> <BR><FONT SIZE=3D2>3.1 Human recognition abilities </FONT> <BR><FONT SIZE=3D2>3.2 Musical instrument sound: acoustics and = perception</FONT> <BR><FONT SIZE=3D2>3.2.1 An aside on "timbre" </FONT> <BR><FONT SIZE=3D2>3.2.2 The magnitude spectrum </FONT> <BR><FONT SIZE=3D2>3.2.3 The dimensions of sound </FONT> <BR><FONT SIZE=3D2>3.2.4 Resonances </FONT> <BR><FONT SIZE=3D2>3.3 Instrument families </FONT> <BR><FONT SIZE=3D2>3.3.1 The brass instruments </FONT> <BR><FONT SIZE=3D2>3.3.2 The string instruments </FONT> <BR><FONT SIZE=3D2>3.3.3 The woodwind instruments </FONT> <BR><FONT SIZE=3D2>3.4 Summary </FONT> </P> <P><FONT SIZE=3D2>4 Representation</FONT> <BR><FONT SIZE=3D2>4.1 Overview </FONT> <BR><FONT SIZE=3D2>4.1.1 Mid-level representation </FONT> <BR><FONT SIZE=3D2>4.1.2 Features and classification </FONT> <BR><FONT SIZE=3D2>4.2 The front end </FONT> <BR><FONT SIZE=3D2>4.2.1 Bandpass filterbank </FONT> <BR><FONT SIZE=3D2>4.2.2 Inner hair cell transduction </FONT> <BR><FONT SIZE=3D2>4.2.3 Pitch analysis </FONT> <BR><FONT SIZE=3D2>4.3 The weft </FONT> <BR><FONT SIZE=3D2>4.4 Note properties / source models </FONT> <BR><FONT SIZE=3D2>4.4.1 Spectral features </FONT> <BR><FONT SIZE=3D2>4.4.2 Pitch, vibrato, and tremolo features </FONT> <BR><FONT SIZE=3D2>4.4.3 Attack transient properties </FONT> <BR><FONT SIZE=3D2>4.5 The model hierarchy </FONT> </P> <P><FONT SIZE=3D2>5 Recognition </FONT> <BR><FONT SIZE=3D2>5.1 Overview and goals </FONT> <BR><FONT SIZE=3D2>5.2 Definitions and basic principles </FONT> <BR><FONT SIZE=3D2>5.3 Taxonomic classification </FONT> <BR><FONT SIZE=3D2>5.3.1 Extension #1: Context-dependent feature = selection </FONT> <BR><FONT SIZE=3D2>5.3.2 Extension #2: Rule-one-out </FONT> <BR><FONT SIZE=3D2>5.3.3 Extension #3: Beam search </FONT> <BR><FONT SIZE=3D2>5.4 Strengths of the approach </FONT> <BR><FONT SIZE=3D2>5.5 An example of the recognition process</FONT> </P> <P><FONT SIZE=3D2>6 Evaluation </FONT> <BR><FONT SIZE=3D2>6.1 A database of solo orchestral instrument = recordings </FONT> <BR><FONT SIZE=3D2>6.2 Testing human abilities </FONT> <BR><FONT SIZE=3D2>6.2.1 Experimental method </FONT> <BR><FONT SIZE=3D2>6.2.2 Results </FONT> <BR><FONT SIZE=3D2>6.2.3 Discussion </FONT> <BR><FONT SIZE=3D2>6.3 Computer experiment #1: Isolated tone pilot = study</FONT> <BR><FONT SIZE=3D2>6.4 Computer experiment #2: 6- to 8-way = classification </FONT> <BR><FONT SIZE=3D2>6.5 Computer experiment #3: Direct comparison to = human abilities</FONT> <BR><FONT SIZE=3D2>6.6 General discussion </FONT> </P> <P><FONT SIZE=3D2>7 Summary and conclusions</FONT> <BR><FONT SIZE=3D2>7.1 Summary </FONT> <BR><FONT SIZE=3D2>7.2 Future developments </FONT> <BR><FONT SIZE=3D2>7.3 Insights gained </FONT> <BR><FONT SIZE=3D2>7.4 Conclusions </FONT> </P> <P><FONT SIZE=3D2>References </FONT> </P> </BODY> </HTML> ------_=_NextPart_001_01BF8DDF.A6244904--


This message came from the mail archive
http://www.auditory.org/postings/2000/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University