[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Dissertation announcement: Music-Listening Systems

Dear Auditory list,

I have recently completed my dissertation and submitted it to
the Media Laboratory.  A one-sentence description of the contents
is: "Computational models of the auditory scene analysis of
fully-complex musical signals, and their use in explaining human
semantic judgments of music."  I hope someone on the list
finds it interesting and/or useful.

It may be downloaded from my web site at


I am also pleased to mail paper copies (once they return from the
bindery) to those who would like them, as long as my supply
lasts.  Please send your mailing address via (off-list) email.

Abstract and TOC follow.

Best to all,

-- Eric

|  Eric Scheirer  |A-7b5 D7b9|G-7 C7|Cb   C-7b5 F7#9|Bb  |B-7 E7|
|eds@media.mit.edu|      < http://sound.media.mit.edu/~eds >
|  617 253 1750   |A A/G# F#-7 F#-/E|Eb-7b5 D7b5|Db|C7b5 B7b5|Bb|

Scheirer, E. D. (2000)  Music-Listening Systems.
Unpublished Ph.D. Dissertation, MIT Media Laboratory,
June 2000. 248pp.


When human listeners are confronted with musical sounds, they
rapidly and automatically orient themselves in the music.  Even
musically untrained listeners have an exceptional ability to make
rapid judgments about music from very short examples, such
as determining the music s style, performer, beat, complexity,
and emotional impact.  However, there are presently no theories
of music perception that can explain this behavior, and it has
proven very difficult to build computer music-analysis tools with
similar capabilities.  This dissertation examines the psychoacoustic
origins of the early stages of music listening in humans, using both
experimental and computer-modeling approaches.  The results
of this research enable the construction of automatic machine-
listening systems that can make human-like judgments about
short musical stimuli.

New models are presented that explain the perception of musical
tempo, the perceived segmentation of sound scenes into multiple
auditory images, and the extraction of musical features from
complex musical sounds.  These models are implemented as
signal-processing and pattern-recognition computer programs,
using the principle of *understanding without separation*.  Two
experiments with human listeners study the rapid assignment of
high-level judgments to musical stimuli, and it is demonstrated
that many of the experimental results can be explained with a
multiple-regression model on the extracted musical features.

From a theoretical standpoint, the thesis shows how theories of
music perception can be grounded in a principled way upon
sychoacoustic models in a computational-auditory-scene-
analysis framework.  Further, the perceptual theory presented is
more relevant to everyday listeners and situations than are
previous cognitive-structuralist approaches to music perception
and cognition.  From a practical standpoint, the various models
form a set of computer signal-processing and pattern-recognition
tools that can mimic human perceptual abilities on a variety of
musical tasks such as tapping along with the beat, parsing music
into sections, making semantic judgments about musical
examples, and estimating the similarity of two pieces of music.


Music-Listening Systems
Table of Contents

CHAPTER 1 Introduction
1.1. Organization

CHAPTER 2 Background
2.1. Psychoacoustics
2.1.1. Pitch theory and models
2.1.2. Computational auditory scene analysis
2.1.3. Spectral-temporal pattern analysis
2.2. Music Psychology
2.2.1. Pitch, melody, and tonality
2.2.2. Perception of chords: tonal consonance and tonal fusion
2.2.3. The perception of musical timbre
2.2.4. Music and emotion
2.2.5. Perception of musical structure
2.2.6. Epistemology/general perception of music
2.2.7. Musical experts and novices
2.3. Musical signal processing
2.3.1. Pitch-tracking
2.3.2. Automatic music transcription
2.3.3. Representations and connections to perception
2.3.4. Tempo and beat-tracking models
2.3.5. Audio classification
2.4. Recent cross-disciplinary approaches
2.5. Chapter summary

CHAPTER 3 Approach
3.1. Definitions
3.1.1. The auditory stimulus
3.1.2. Properties, attributes and features of the auditory stimulus
3.1.3. Mixtures of sounds
3.1.4. Attributes of mixtures
3.1.5. The perceived qualities of music
3.2. The musical surface
3.3. Representations and computer models in perception research
3.3.1. Representation and Music-AI
3.3.2. On components
3.4. Understanding without Separation
3.4.1. Bottom-up vs. Top-Down Processing
3.5. Chapter summary

CHAPTER 4 Musical Tempo
4.1. A Psychoacoustic Demonstration
4.2. Description of a Beat-tracking Model
4.2.1. Frequency analysis and envelope extraction
4.2.2. Resonators and tempo analysis
4.2.3. Phase determination
4.2.4. Comparison with autocorrelation methods
4.3. Implementation and Complexity
4.3.1. Program parameters
4.3.2. Behavior tuning
4.4. Validation
4.4.1. Qualitative performance
4.4.2. Validation Experiment
4.5. Discussion
4.5.1. Processing level
4.5.2. Prediction and Retrospection
4.5.3. Tempo vs. Rhythm
4.5.4. Comparison to other psychoacoustic models
4.6. Chapter summary

CHAPTER 5 Musical Scene Analysis
5.1. The dynamics of subband periodicity
5.2. Processing model
5.2.1. Frequency analysis and hair-cell modeling
5.2.2. Modulation analysis
5.2.3. Dynamic clustering analysis: goals
5.2.4. Dynamic clustering analysis: cluster model
5.2.5. Dynamic clustering analysis: time-series labeling
5.2.6. Dynamic cluster analysis: channel-image assignment
5.2.7. Limitations of this clustering model
5.2.8. Feature analysis
5.3. Model implementation
5.3.1. Implementation details
5.3.2. Summary of free parameters
5.4. Psychoacoustic tests
5.4.1. Grouping by common frequency modulation
5.4.2. The temporal coherence boundary
5.4.3. Alternating wideband and narrowband noise
5.4.4. Comodulation release from masking
5.5. General discussion
5.5.1. Complexity of the model
5.5.2. Comparison to other models
5.5.3. Comparison to auditory physiology
5.5.4. The role of attention
5.5.5. Evaluation of performance for complex sound scenes
5.6. Chapter summary and conclusions

CHAPTER 6 Musical Features
6.1. Signal representations of real music
6.2. Feature-based models of musical perceptions
6.3. Feature extraction
6.3.1. Features based on auditory image configuration
6.3.2. Tempo and beat features
6.3.3. Psychoacoustic features based on image segmentation
6.4. Feature interdependencies
6.5. Chapter summary

CHAPTER 7 Musical Perceptions
7.1. Semantic features of short musical stimuli
7.1.1. Overview of procedure
7.1.2. Subjects
7.1.3. Materials
7.1.4. Detailed procedure
7.1.5. Dependent measures
7.1.6. Results
7.2. Modeling semantic features
7.2.1. Modeling mean responses
7.2.2. Intersubject differences in model prediction
7.2.3. Comparison to other feature models
7.3. Experiment II: Perceived similarity of short musical stimuli
7.3.1. Overview of procedure
7.3.2. Subjects
7.3.3. Materials
7.3.4. Detailed procedure
7.3.5. Dependent measures
7.3.6. Results
7.4. Modeling perceived similarity
7.4.1. Predicting similarity from psychoacoustic features
7.4.2. Predicting similarity from semantic judgments
7.4.3. Individual differences
7.4.4. Multidimensional scaling
7.5. Experiment III: Effect of interface
7.6. General discussion
7.7. Applications
7.7.1. Music retrieval by example
7.7.2. Parsing music into sections
7.7.3. Classifying music by genre
7.8. Chapter summary

CHAPTER 8 Conclusion
8.1. Summary of results
8.2. Contributions
8.3. Future work
8.3.1. Applications of tempo-tracking
8.3.2. Applications of music-listening systems
8.3.3. Continued evaluation of image-formation model
8.3.4. Experimental methodology
8.3.5. Data modeling and individual differences
8.3.6. Integrating sensory and symbolic models

Appendix A: Musical Stimuli

Appendix B: Synthesis Code
B.1.  McAdams oboe
B.2. Temporal coherence threshold
B.3. Alternating wideband and narrowband noise
B.4. Comodulation release from masking