Auditory Toolbox for Matlab (Malcolm Slaney )


Subject: Auditory Toolbox for Matlab
From:    Malcolm Slaney  <malcolm(at)INTERVAL.COM>
Date:    Tue, 19 Jan 1999 22:15:44 -0800

I'm very happy to announce that version 2.0 of the Matlab Auditory toolbox is now available for downloading. More information, documentation, and downloading links are available at http://web.interval.com/papers/1998-010/ The auditory toolbox builds on top of the capabilities of a commercial calculation system known as Matlab. Matlab provides all the IO routines and provides a general environment for programming. The Matlab auditory toolbox extends Matlab's capabilities by providing a number of auditory models. A small amount of code, all written in a very high-level scripting language, is easy to understand and modify (if necessary) The first version of this toolbox was published while I was at Apple. Apple has graciously given Interval permission to update the toolbox, fixing some bugs and adding some new features. The primary modules provided by this toolbox are gammatone filters, Meddis' hair cell model, Lyon's cochlear model, correlograms, Seneff's ear model, and some common representations from the speech world. A large number of people have helped make this toolbox better, both by providing code and feedback. Certainly Richard F. Lyon, Ray Meddis, Richard Duda, Chris Pal, Kate Nguyen and Alain de Cheveigne' have made large contributions. Thank you. I hope you find this toolbox useful. There are no guarantees, after all this code is free. Please let me know if you download this and want to be notified of updates. -- Malcolm P.S. I am interested in other contributions to this toolbox. Please let me know if you have an implementation of an auditory model in Matlab which might fit into this package. What is the Auditory Toolbox? This report describes a collection of tools that implement several popular auditory models for a numerical programming environment called MATLAB. This toolbox will be useful to researchers that are interested in how the auditory periphery works and want to compare and test their theories. This toolbox will also be useful to speech and auditory engineers who want to see how the human auditory system represents sounds. This version of the toolbox fixes several bugs, especially in the Gammatone and MFCC implementations, and adds several new functions. This report was previously published as Apple Computer Technical Report #45. We appreciate receiving permission from Apple Computer to republish their code and to update this package. There are many ways to describe and represent sounds. The figure below shows one taxonomy based on signal dimensionality. A simple waveform is a one-dimensional representation of sound. The two-dimensional representation describes the acoustic signal as a time-frequency image. This is the typical approach for sound and speech analysis. This toolbox includes conventional tools such as the short-time-Fourier-Transform (STFT or Spectrogram) and several cochlear models that estimate auditory nerve firing =93probabilities=94 as a function of time. Finally, the next level of abstraction is to summarize the periodicities of the cochlear output with the correlogram. The correlogram provides a powerful representation that makes it easier to understand multiple sounds and to perform auditory scene analysis. What does the Auditory Toolbox contain? Six types of auditory time-frequency representations are implemented in this toolbox: 1. Richard F. Lyon has described an auditory model based on a transmission line model of the basilar membrane and followed by several stages of adaptation. This model can represent sound at either a fine time scale (probabilities of an auditory nerve firing) or at the longer time scales characteristic of the spectrogram or MFCC analysis. The LyonPassiveEar command implements this particular ear model. 2. Roy Patterson has proposed a model of psychoacoustic filtering based on critical bands. This auditory front-end combines a Gammatone filter bank with a model of hair cell dynamics proposed by Ray Meddis. This auditory model is implemented using the MakeERBFilters, ERBFilterBank, and MeddisHairCell commands. 3.Stephanie Seneff has described a cochlear model that combines a critical band filterbank with models of detection and automatic gain control. This toolbox implements stages I and II of her model. 4. Conventional FFT analysis is represented using the spectrogram. Both narrow band and wide band spectrograms are possible. See the spectrogram command for more information. 5. A common front-end for many speech recognition systems consists of Mel-frequency cepstral coefficients (MFCC). This technique combines an auditory filter-bank with a cosine transform to give a rate representation roughly similar to the auditory system. See the mfcc command for more information. In addition, a common technique known as rasta is included to filter the coefficients, simulating the effects of masking and providing speech recognition system a measure of environmental adaptation. 6. Conventional speech-recognition systems often use linear-predictive analysis to model a speech signal. The forward transform, proclpc, and its inverse, synlpc are included. Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv


This message came from the mail archive
http://www.auditory.org/postings/1999/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University