Subject: auditory filters without tails From: Anssi Klapuri <klap(at)CS.TUT.FI> Date: Mon, 15 Nov 2004 14:39:29 +0200Dear Experts on auditory modeling, I am developing auditorily-motivated methods for extracting multiple pitches from musical audio (= polyphonic music transcription). A question related to auditory filters is burning in my mind: PROBLEM: The shape of the magnitude response of the "gammatone" auditory filters can be characterized as "rounded-exponential + TAILS around the passband". Due to the tails, a filter centered on 1 kHz, for example, exhibits only up to 65 dB attenuation for frequencies in the range 0-500 Hz. This is NOT sufficient in music analysis; the filters at higher bands are not able to really reject the dominating sinusoidal components at lower frequencies. I would like to get rid of the tails of the auditory filter alltogether. QUESTION: Can anyone provide me a digital IIR-filter implementation for the Roex(p) frequency response (without the tails) as proposed in [1] ? The gammatone implementation by Slaney in [2] includes the tails, thus implementing the Roex(p,w,t) type response rather than the simpler Roex(p) model in [1]. DISCUSSION: It seems to me that steeper auditory filters (without the tails) should be used for computational analysis of audio. This is because the components at the neighbouring bands are sinusoids and not flat noise as in the experiments where the auditory filter shape was originaly measured and derived. >From the theoretical perspective (my goals being quite application-oriented), I am quoting one paragraph from [1] which indicates that the tails of the auditory filter are simply due to approach to the absolute threshold! Please let me know if I am completely on the wrong track or if the conclusion by Patterson and Moore below is no longer valid. Quotation from [1, p.144]: "The dynamic range of the filters shown in Fig. 3.13 decreases with decreasing [notched] noise level. This suggests that the tails of the auditory filter are simply a consequence of masked threshold approaching absolute threshold at wide notch bandwidths. A similar conclusion was reached by Glasberg et al. (1984b) who measured the characteristic of the auditory filter over a greater dynamic range than had been done previously, using a relatively high noise spectrum level, 45 dB, and maskers with very wide notches. They also used listeners with a wide range of ages and, correspondingly, a wide range of absolute thresholds. The tails that typically flank the passband of the auditory filter had previously been interpreted as being an inherent part of the filter shape. If this were true, the filters of Glasberg et al. should have had typical passbands with extended tails because of the greater dynamic range of the measurements. Instead their filters showed passbands with markedly increased dynamic ranges and the same small tails as found previously. Although the decrease in the slope of the curve relating threshold to notch width often occurs well before the signal reaches absolute threshold, the approach to absolute threshold does seem to be the crucial factor in producing the tails of the filter." REFERENCES [1] Patterson, Moore, "Auditory filters and excitation patterns as representations of frequency resolution," In Frequency Selectivity in Hearing, Moore (Ed.), Academic Press, 1986. [2] Slaney, "An Efficient Implementation of the Patterson-Holdsworth Cochlear Filter Bank," Apple TR #35, 1993. Any comments and advice are greatly appreciated. With best regards, --Anssi ___________________________________________________________________________ Anssi Klapuri klap(at)cs.tut.fi http://www.cs.tut.fi/~klap work: Tampere University of Tech., P.O.Box 553, FIN-33101 Tampere, Finland tel.: +358 3 3115 2124, fax: +358 3 3115 4954, gsm: +358 50 364 8208