Re: Intermediate representation for music analysis (Hugh McDERMOTT )


Subject: Re: Intermediate representation for music analysis
From:    Hugh McDERMOTT  <hughm@xxxxxxxx>
Date:    Tue, 18 Jul 2006 17:18:38 +1000

I would add to this that, using an FFT, it is quite easy to measure the component frequencies of a complex signal with precision that is finer than the bin spacing. One just needs to estimate the rate of change of the phase of a component within a bin. This technique, which has been described in the context of the so-called phase vocoder algorithm, permits the frequency of each signal component resolved by the FFT to be estimated more precisely than the limit apparently imposed by the FFT bin spacing in the frequency domain. Best regards, Hugh McDermott, PhD Principal Research Fellow Department of Otolaryngology The University of Melbourne 384 - 388 Albert Street, East Melbourne. 3002 Australia. Phone: +61 3 9929 8665 Fax: +61 3 9663 6086 E-mail: hughm@xxxxxxxx Web page: http://www.medoto.unimelb.edu.au/people/mcdermoh/ -----Original Message----- From: AUDITORY Research in Auditory Perception [mailto:AUDITORY@xxxxxxxx On Behalf Of Bob Masta Sent: Monday, 17 July 2006 11:01 PM To: AUDITORY@xxxxxxxx Subject: Re: Intermediate representation for music analysis Note that no matter what sort of analysis you do, the frequency resolution is determined by the reciprocal of the analysis window duration. So if you want fine resolution for the low frequencies, you need a long sample set, even if you only need much coarser resolution at the high frequencies (due to the log nature of hearing). So, why not just take a long FFT? Even though they have linear frequency spacing, FFTs have been heavily optimized for efficient computation. I wonder if it might be better using a conventional FFT and lumping some upper bins together to form quasi-log bands, rather than using a less-efficient log-spaced filter bank. There is one weakness to that approach, however, in that if you set the overall FFT length so that the lowest band you want to handle is just exactly matched by the lowest FFT spectral line width, then the next spectral line will be at *twie* that... there will be no nice fractional-octave alignment. If you really need that, a log filter bank may be best. However, the way I have seen this handled is to assume (hope?) that there will be plenty of upper harmonics in the signal, many of which will fall into regions of the FFT where the resolution (considered on an octave basis) is much higher. By looking at a few of these upper harmonics, it was possible to figure out what the actual fundamental frequency was to similarly-high resolution. Best regards, Bob Masta audioATdaqartaDOTcom


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University