Subject: Re: The natural spectrogram From: Eckard Blumschein <Eckard.Blumschein(at)E-TECHNIK.UNI-MAGDEBURG.DE> Date: Fri, 30 Jan 2004 11:22:06 +0100At 11:09 29.01.2004 -0800, Julius Smith wrote: >At 01:48 AM 1/28/2004, Eckard Blumschein wrote: >>...So far I can neither imagine the >>STFT itself to be natural nor a spectrogram based on it. Wouldn't this >>require to naturally choose size of the window? > >Yes -- and as a function of frequency. We normally call it a >"multiresolution" STFT. In this sense and due to the absence of arbitrary windows, both the actual cochlear functions and the suggested natural spectrogram are distinguished by a steplessly sliding rather than just a stepping 'multi' resolution. Use of STFT at least requires an arbitrary decision how many windows to choose for every moment. Perhaps there is no natural preference for a particular variant of choices. > >>Wouldn't one have to decide further arbitrary parameters like the degree >>of overlap? > >This is just a sampling-rate issue. If computational cost is no object, >one can simply choose maximum overlap (i.e., a "sliding FFT" instead of a >"hopping FFT"). On the other hand, FFT filter banks can usually be >downsampled quite a lot and still give equivalent end results. In this >context, your window is your anti-aliasing filter for >downsampling. Reference: Jont B. Allen, "Short Term Spectral Analysis, >Synthesis, and Modification by Discrete Fourier Transform", IEEE ASSP-25(3). Cochlea is not subject to a sample rate, and the natural spectrogram adapts to the given sampled input in a similar manner as does the 'sliding' FFT. Neither cochlea nor the natural spectrogram require an anti-aliasing filter. You are quite right: The wider the frequency range, the more it makes sense to downsample the input to the low-frequency window. Isn't it a calamity: Perhaps, there is no way of sliding down-sampling. Even with most powerful computers, a multidimensional field of FFTs which might be further complicated by a sophisticated structure of downsampling looks anything but practical and parsimonious. Its first dimension are the frequency steps of multiresolution, its second one are the temporal steps of overlap, its third one are frequency dependent ratios of downsampling. Jont's paper dates back to 1977 when one could not yet imagine PCs to do that job. So I consider it of historical value. > >> Doesn't any usual spectrogram incompletely represent the information? > >STFTs are normally invertible, in my experience, even in the presence of >aliasing due to downsampling (it gets canceled in the reconstruction). The >classic spectrogram discards phase, so it is not exactly invertible. Of >course, it is well known that phase can be reconstructed from STFT >magnitude to a large extent for typical signals and analysis conditions. Cochlea as well as the natural spectrogram are not subject to such consequences of inappropriate theory. >>Isn't the usual spectrogram subject to the notorious trade-off beween >>spectral and temporal resolution? > >Well sure, but we can let the human ear tell us where to be on that trade-off. This might be not quite correct for several reasons. The smallest product delta t times delta f of hearing is much better than according to the uncertainty principle. Aren't about 10 microseconds and 1 Hz realistic? The product is 10^-5 << 1. Frequency resolution of the natural spectrogram is not at all restricted, in principle. >>Was there any physiological justification for STFT which could include the >>rectification? >>Is there close similarity to measurement of BM motion and neural pattern? > >I don't understand the first question. My understanding of rectification >that this is the nature of how the hair cells respond to basilar membrane >vibration. Firing increases when the membrane pushes one way, but not the >other. That is my understanding of rectification, too. >The STFT implements a filter bank, and the output of that filter >bank can be rectified accordingly (applied to real time-domain signals at >the STFT filter-bank output, of course). The latter is the point. Complex FT including STFT does not at all deliver a real time-domain signal but magnitude and phase. The usual spectrogram shows magnitude vs. time. A magnitude cannot be rectified. Therefore, the usual spectrogram fails to convey the information responsible for audible effects of polarity, eg between positive and negative clicks. The natural spectrogram is more natural in any comparison with the usual one because it is based on Fourier cosine transform (FCT) which directly provides the input to rectification. Proponents of complex FT might claim the FCT to be just the real part of FT. This is formally largely correct. However, it ignores several important flaws arising from arbitrary preconditions of complex calculus. Let's skip the two most basic arbitrary choices (origin and sign of imaginary part). Complex FT always presumes tacit introduction of redundancy. The most 'correct' input to the FT in case of what I suggest to call an 'effectual signal' fills a window that is located symmetrical with respect to zero but padded with zeros for the time to come in which the signal is unknown. The word 'effectual' indicates correspondence to the so called causal signal. The effectual signal differs from a simply time-mirrored (anti-causal) one. The zeros introduce two mutually cancelling fictive components each of which alone would violate causality. That's why the traditional spectrogram exhibits non-causality. Being obvious nonsense, non-causal output before any input is the more strikingly to be seen the wider the window has been chosen. Any Fourier transform of a causal signal or an effectual signal shows Hermitian symmetry. In other words, its real part is symmetrical over frequency. Negative frequency does not have any particular physical meaning. It is just an artefact of complex calculus. > >I suppose you're posting to the right list! In 1844, Ohm dismissed Seebeck's observation. He supposed that a missing fundamental cannot be heared. Let's be open to the insight that cochlea performs a real-valued rather than complex-valued frequency analysis. Then we are in position to deal with the further steps of physiological signal processing on a more sound basis. As a first result, I offered a 'joint autocorrelation' hypothesis. Hopefully, it will reconcile Peter and Christian. Also it could, for the first time, plausibly tell to Ohm why Seebeck was right. To this list, Chen-Gia Tsai posted his observation of a pitch at 9f0/4 (if I recall correctly). I imagine hundreds of experts here on the lurk for something new. Of course, corrections are always and anywhere unwelcome. I do not intend to hurt anybody. If someone feels offended I apologize for that. I will sincerely try going on responding privately to all objections and request. Eckard