[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: STFT vs Power Spectral in Musical recognition system ?



One problem of the square-root compression is that its slope
approaches infinity as the magnitude M approaches zero. A more
appropriate approach may be to use log(1+KM), where K is a constant to
be determined. The response of this function is almost logarithmic for
high magnitudes and almost linear for low magnitudes. Of course, the
determination of the optimal value for K given an input is not
trivial.

Arturo
-- 
__________________________________________________

 Arturo Camacho
 PhD Candidate
 Computer and Information Science and Engineering
 University of Florida

 E-mail: acamacho@xxxxxxxxxxxx
 Web page: www.cise.ufl.edu/~acamacho
__________________________________________________

On Fri, 25 Aug 2006, Richard F. Lyon wrote:

> Edwin,
>
> A power spectral density is only defined for stationary signals, not
> music.  The STFT generalizes it to short segments, if you use the
> squared magnitude.
>
> The difference between the absolute value, square, log, etc. are just
> point nonlinearities that do not change the information content, but
> do change the metric structure of the space a bit.  Log is too
> compressed, leading to too much emphasis on near-silent segments,
> while the square (the power you ask about) is too expanded, leading
> to too much emphasis on the louder parts.  A good compromise is
> around a square root or cube root of magnitude (roughly matching
> perceptual magnitude via Stevens's law), but the magnitude itself is
> also sometimes acceptable, depending on what you're doing.
>
> Dick
>
> At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote:
> >Content-Type: text/html
> >X-MIME-Autoconverted: from 8bit to quoted-printable by
> >torrent.cc.mcgill.ca id k7PED6jh031610
> >
> >Hello,
> >
> >I am just a master student, doing my internship. Right now, I am
> >building a musical instrument recognition system. I have read
> >several papers on it, and I am just curious:
> >
> >All the papers/journals that I have read use the STFT, a.k.a the
> >|X(t,f)| of a signal x(t), in order to extract several (spectral)
> >features to be used as the input to the recognition system.
> >
> >What are the reasons behind using the |X(t,f)| instead of using the
> >"power spectral" |X(t,f)|^2 ?
> >(technically speaking, a power spectral density is the expectation
> >of |X(f)|^2, i.e. E(|X(f)|^2) )
> >
> >Thanks in advance,
> >
> >Edwin SIANTURI
> >
>
>