Re: Speech(or Phase) Reconstruction from Magnitude Spectrum (Mark Hasegawa-Johnson )

DAn Ellis <dpwe@ee.columbia.edu> Electrical Engineering Dept., Columbia University

Subject: Re: Speech(or Phase) Reconstruction from Magnitude Spectrum
From:    Mark Hasegawa-Johnson  <jhasegaw(at)UIUC.EDU>
Date:    Tue, 1 Feb 2005 15:45:28 -0600

Hi,

I'm not sure if it's already been mentioned, but this article
demonstrates the (relatively easy) conditions under which exact signal
reconstruction from the magnitude STFT is possible.  I think they also
give an algorithm:

(at)article{NawQua83,
  abstract="any signal can be reconstructed from magnitude of STFT if
      overlap >= half window length using linear equations based
      on |X|^2=FFT(autocorrelation). Interesting, but
      pedagogically dangerous because it obscures the more
      general but less efficient DCT reconstruction theorem.",
  author="S.~Hamid Nawab and Thomas F. Quatieri and Jae S. Lim",
  journal=tassp,
  keywords="speech coding, digital signal processing",
  number="4",
  pages="986-998",
  title="Signal Reconstruction from Short-Time Fourier Transform
      Magnitude",
  volume="31",
  year="1983"
}

Matt Flax wrote:
> Hi,
>
> This topic is very signal processing, or DSP. You will find efficient
> solutions by discussing this on the music-dsp e-mail list :
> http://ceait.calarts.edu/mailman/listinfo/music-dsp
>
> Yes you are correct. You do want to 'complexify' the magnitude only
> signal. You are now going down a road which is well tread, let me
> propose another approach ...
>
> Rather then think about the instantaneous phase of the signal, consider
> how the signal will be processed in sequential blocks .... how do you
> combine blocks (windows) of processed signal ?
> You may want to look into the standard overlap add technique and combine
> it with your current direction.
>
> Back to your topic .... and in a slightly different approach ...
> This complexification can come in many standard forms. They
> include minimum phase, maximum phase, zero phase and also mixed phase.
> The 'phase' relates to how the signal energy is centered in the time domain.
>
> Say you do a zero phase realisation, then the overall signal power will
> fluctuate according to the STFT power in each Fourier block of data. So
> if you keep your block resolution small enough, you should be able to
> get a pretty good signal in the end .... this is in some way connected
> to the question ... "What is the best sized window required to represent
> speech ... ". The answer to that question must be, well, what do you
> want to represent best ?!(at)# and can be quite a complex issue ...
>
> I attach the opposite of what you want to do ... if you invert this one
> line algorithm then you will find your answer !!! Pretend the signal in
> the script is not in the time domain, but the frequency domain ...
> in other words whatever domain you put into the signal, you get out of
> the algorithm ... time -> time, frequency -> frequency, f(freq) ->
> f(freq) and so on....
>
> Be careful and remember some signals are energy and some signals are
> power ... these are non-linearly related ... so step your
> algorithm carefully from reading in the data to writing it out ...
>
> Matt
> --
> http://www.flatmax.org
>
> MFFM Bit Stream :
> http://sourceforge.net/projects/mffmbitstream/
> Other Projects :
> http://sourceforge.net/search/?type_of_search=soft&words=mffm
>
>
>
> ------------------------------------------------------------------------
>
> %# Copyright 2004 Matt Flax <flatmax(at)ieee.org>
> %# This file is a stand alone tool for generating a zero phase
> %# signal from a complex time signal
> %#
> %# It is free software; you can
> %# redistribute it and/or modify
> %# it under the terms of the GNU General Public License as published by
> %# the Free Software Foundation; either version 2 of the License, or
> %# (at your option) any later version.
> %#
> %# This file is distributed in the hope that it will be useful,
> %# but WITHOUT ANY WARRANTY; without even the implied warranty of
> %# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> %# GNU General Public License for more details.
> %#
> %# You have received a copy of the GNU General Public License
> %# along with this file, if not then please refer to www.gnu.org
> %# to gain access to the GNU GPL license.
>
> function [rSig,cSig]=complexSigToRealSig(complexSig)
>
>   %# converts complexSig to rSig, with zero vector cSig returned.
>   %# This function is a zero phase implementation.
>
>   signal=ifft(sqrt(2*(abs(fft(imag(complexSig))).^2+abs(fft(real(complexSig))).^
> 2)));
>   rSig=real(signal);
>   cSig=imag(signal);
>
>
> endfunction