[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: STFT vs Power Spectral in Musical recognition system ?

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: STFT vs Power Spectral in Musical recognition system ?
From: "Richard F. Lyon" <DickLyon@xxxxxxx>
Date: Fri, 25 Aug 2006 08:32:11 -0700
Comments: To: Edwin Sianturi <sianturiauditory@yahoo.com>
Delivery-date: Fri Aug 25 11:44:54 2006
In-reply-to: <20060825141259.13811.qmail@web38503.mail.mud.yahoo.com>
References: <20060825141259.13811.qmail@web38503.mail.mud.yahoo.com>
Reply-to: "Richard F. Lyon" <DickLyon@xxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Edwin,

A power spectral density is only defined for stationary signals, not music. The STFT generalizes it to short segments, if you use the squared magnitude.

The difference between the absolute value, square, log, etc. are just point nonlinearities that do not change the information content, but do change the metric structure of the space a bit. Log is too compressed, leading to too much emphasis on near-silent segments, while the square (the power you ask about) is too expanded, leading to too much emphasis on the louder parts. A good compromise is around a square root or cube root of magnitude (roughly matching perceptual magnitude via Stevens's law), but the magnitude itself is also sometimes acceptable, depending on what you're doing.

Dick

At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote:

Content-Type: text/html X-MIME-Autoconverted: from 8bit to quoted-printable by torrent.cc.mcgill.ca id k7PED6jh031610
Hello,
I am just a master student, doing my internship. Right now, I am building a musical instrument recognition system. I have read several papers on it, and I am just curious:

All the papers/journals that I have read use the STFT, a.k.a the |X(t,f)| of a signal x(t), in order to extract several (spectral) features to be used as the input to the recognition system.

What are the reasons behind using the |X(t,f)| instead of using the "power spectral" |X(t,f)|^2 ? (technically speaking, a power spectral density is the expectation of |X(f)|^2, i.e. E(|X(f)|^2) )
Thanks in advance,
Edwin SIANTURI

Follow-Ups:
- Re: STFT vs Power Spectral in Musical recognition system ?
  - From: Arturo Camacho

References:
- STFT vs Power Spectral in Musical recognition system ?
  - From: Edwin Sianturi

Prev by Date: STFT vs Power Spectral in Musical recognition system ?
Next by Date: Re: Cochlear mechanics.
Previous by thread: STFT vs Power Spectral in Musical recognition system ?
Next by thread: Re: STFT vs Power Spectral in Musical recognition system ?
Index(es):
- Date
- Thread