One problem of the square-root compression is that its slope
approaches infinity as the magnitude M approaches zero. A more
appropriate approach may be to use log(1+KM), where K is a constant to
be determined. The response of this function is almost logarithmic for
high magnitudes and almost linear for low magnitudes. Of course, the
determination of the optimal value for K given an input is not
trivial.
Arturo
--
__________________________________________________
Arturo Camacho
PhD Candidate
Computer and Information Science and Engineering
University of Florida
E-mail: acamacho@xxxxxxxxxxxx
Web page: www.cise.ufl.edu/~acamacho
__________________________________________________
On Fri, 25 Aug 2006, Richard F. Lyon wrote:
Edwin,
A power spectral density is only defined for stationary signals, not
music. The STFT generalizes it to short segments, if you use the
squared magnitude.
The difference between the absolute value, square, log, etc. are just
point nonlinearities that do not change the information content, but
do change the metric structure of the space a bit. Log is too
compressed, leading to too much emphasis on near-silent segments,
while the square (the power you ask about) is too expanded, leading
to too much emphasis on the louder parts. A good compromise is
around a square root or cube root of magnitude (roughly matching
perceptual magnitude via Stevens's law), but the magnitude itself is
also sometimes acceptable, depending on what you're doing.
Dick
At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote:
>Content-Type: text/html
>X-MIME-Autoconverted: from 8bit to quoted-printable by
>torrent.cc.mcgill.ca id k7PED6jh031610
>
>Hello,
>
>I am just a master student, doing my internship. Right now, I am
>building a musical instrument recognition system. I have read
>several papers on it, and I am just curious:
>
>All the papers/journals that I have read use the STFT, a.k.a the
>|X(t,f)| of a signal x(t), in order to extract several (spectral)
>features to be used as the input to the recognition system.
>
>What are the reasons behind using the |X(t,f)| instead of using the
>"power spectral" |X(t,f)|^2 ?
>(technically speaking, a power spectral density is the expectation
>of |X(f)|^2, i.e. E(|X(f)|^2) )
>
>Thanks in advance,
>
>Edwin SIANTURI
>