Subject: Defense: A hybrid model for timbre perception From: Hiroko Terasawa <shiraiwa@xxxxxxxx> Date: Wed, 21 May 2008 15:58:52 -0700Dear friends, colleagues, and mentors, As a part of my University Oral Exam (i.e. PhD dissertation defense), I will be presenting my work on Friday, 30 May 2008, at noon, at CCRMA stage (Stanford University). The talk should finish by 1:00 pm. You are warmly invited to attend. Abstract is attached below. This is also a part of a day-long CCRMA open house (a.k.a. annual big party.) Please join us for a showcase of the work being done by CCRMA students. http://ccrma.stanford.edu/info/openhouse08/ Best regards, Hiroko Terasawa ____________________ "A Hybrid Model for Timbre Perception" Ph.D. Candidate: Hiroko Terasawa Advisor: Prof. Jonathan Berger Date: May 30 (Fri), 2008 Time: 12:00 pm Location: The Knoll (CCRMA), Stage. (http://tinyurl.com/5o8mbz) Abstract: Timbre, or the perceived quality of sound, is a fundamental attribute of sound. It is important in differentiating between musical sounds, speech utterances, and characterizing everyday sounds in our environment as well as novel synthetic sounds. This dissertation presents a perceptually based hybrid model of timbre perception which integrates the concepts of color and texture. The color of sound is described in terms of an instantaneous (or ideally timeless) spectral envelope while the texture of a sound describes the temporal structure of the sound. The dissertation presents the framework for the model, a discussion of prior research, a computational implementation of the model, and a series of experiments that provide perceptual validation. The computational model represents a sound's color as the spectral envelope of a specific window (although the ideal concept of color is one in which time is non-existent). Texture is represented as the sequential changes of color with an arbitrary range of time-scale. In support of the proposed theory a series of psychoacoutic experiments were performed. The quantitative relationship between the spectral envelope and subjective perception of complex tones used Mel-frequency cepstral coefficients (MFCC) as a representation. A perceptually tested quantitative representation of texture was established using normalized echo density (NED). The elusive nature of describing timbre has been a barrier to music analysis, speech research and psychoacoustics. It is hoped that the framework presented in this dissertation will form the basis of a consistent metric for describing timbre. -- Hiroko Terasawa, Ph.D. Candidate CCRMA, Department of Music Stanford University http://ccrma.stanford.edu/~hiroko/ Hiroko.Terasawa@xxxxxxxx