Re: STFT vs Power Spectral in Musical recognition system ? (Arturo Camacho )


Subject: Re: STFT vs Power Spectral in Musical recognition system ?
From:    Arturo Camacho  <acamacho@xxxxxxxx>
Date:    Thu, 31 Aug 2006 18:32:56 -0400

One problem of the square-root compression is that its slope approaches infinity as the magnitude M approaches zero. A more appropriate approach may be to use log(1+KM), where K is a constant to be determined. The response of this function is almost logarithmic for high magnitudes and almost linear for low magnitudes. Of course, the determination of the optimal value for K given an input is not trivial. Arturo -- __________________________________________________ Arturo Camacho PhD Candidate Computer and Information Science and Engineering University of Florida E-mail: acamacho@xxxxxxxx Web page: www.cise.ufl.edu/~acamacho __________________________________________________ On Fri, 25 Aug 2006, Richard F. Lyon wrote: > Edwin, > > A power spectral density is only defined for stationary signals, not > music. The STFT generalizes it to short segments, if you use the > squared magnitude. > > The difference between the absolute value, square, log, etc. are just > point nonlinearities that do not change the information content, but > do change the metric structure of the space a bit. Log is too > compressed, leading to too much emphasis on near-silent segments, > while the square (the power you ask about) is too expanded, leading > to too much emphasis on the louder parts. A good compromise is > around a square root or cube root of magnitude (roughly matching > perceptual magnitude via Stevens's law), but the magnitude itself is > also sometimes acceptable, depending on what you're doing. > > Dick > > At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote: > >Content-Type: text/html > >X-MIME-Autoconverted: from 8bit to quoted-printable by > >torrent.cc.mcgill.ca id k7PED6jh031610 > > > >Hello, > > > >I am just a master student, doing my internship. Right now, I am > >building a musical instrument recognition system. I have read > >several papers on it, and I am just curious: > > > >All the papers/journals that I have read use the STFT, a.k.a the > >|X(t,f)| of a signal x(t), in order to extract several (spectral) > >features to be used as the input to the recognition system. > > > >What are the reasons behind using the |X(t,f)| instead of using the > >"power spectral" |X(t,f)|^2 ? > >(technically speaking, a power spectral density is the expectation > >of |X(f)|^2, i.e. E(|X(f)|^2) ) > > > >Thanks in advance, > > > >Edwin SIANTURI > > > >


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University