[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Spectrogram Inversion Toolbox



I’m happy to announce that the Spectrogram Inversion Toolbox is now available.  This is Matlab code that finds a waveform that best fits a given spectrogram.

 

You might ask why this is being announced on the auditory mailing list.  The first time I needed this was when I was working on our correlogram inversion work.  It’s also a function included in the NSL (Neural Systems Laboratory) toolbox from Univ. of Maryland.  (Although this new implementation is faster and better.)   And it’s useful if you are doing anything like audio morphing.

 

The source code is online now at

                http://research.microsoft.com/en-US/downloads/5ee40a69-6bf1-43df-8ef4-3fb125815856/default.aspx

 

And more details are below.  Enjoy.

 

--- Malcolm

 

 

 

 

The Spectrogram Inversion Toolbox allows one to create spectrograms

from audio, and, more importantly, estimate the audio that generates

any given spectrogram.  This is useful because often one wants to

think about, and modify sounds in the spectrogram domain.

 

There are two big problems with spectrogram inversion: most importantly,

one (generally) drops the phase when computing a spectrogram, and two

not every (spectrogram) image corresponds to a valid waveform. This

code finds the waveform that has a magnitude spectrogram most like the

input spectrogram.

 

The easy solution is to just do the inversion assuming some phase (like 0).

Back in the time domain you get an answer, but there is a lot of

destructive interference because the segments of adjacent frames do not

have consistent phase. Some people advocate starting with a random

phase.

 

A better solution to this problem is to use an iterative algorithm

proposed by Griffin and Lim many decades ago. It does converge, but

slowly.

 

An even better solution is to do the inversion, explicitly looking

for a good set of phases. This toolbox does that, after the inverse

Fourier transform of each slice, by finding the best time delay so the

new frame and the summed frames to now are consistent.  This is equivalent

to starting with some arbitrary linear phase.  The effect of this is to

reduce the reconstruction error by an order of magnitude. Hurray.