Sound Analysis Tools (Thierry Rochebois )


Subject: Sound Analysis Tools
From:    Thierry Rochebois  <thierry(at)MAURY.IEF-PARIS-SUD.FR>
Date:    Fri, 17 Nov 1995 14:39:15 +0100

Lonce LaMar Wyse wrote: > > Last year, I began a discussion on the list about analysis tools for >> sound processing and resynthesis that prooved to be interesting for many >> people on the auditory list. >Quite interested. >I can't say I recall the previous discussion - do you have it archived >so that you could email me a copy? I think it is archived somewhere on the Auditory mailing list web pages. >> My research is about the analysis and resynthesis of musical sounds. >What distinction do you mean to make between musical and nonmusical sounds? I must have said tuned and untuned sounds. I will start the discussion by describing my purposes and what algorithm I use. My primary goal is to separate the harmonic part and the stochastic part of the sound. There are many existing algorithms for analysing the harmonic part of a sound: - heterodyne filtering - phase vocoder - fft interpolation (PARSHL,MQ) - wavelet transforms (Toshio Irino) - wavelet transform interpolations (Dan Ellis) I choose the FFT interpolation: for every time frame I do a windowed FFT. I use a Gaussian window. When you multiply the signal by the window, you convolve the spectrum by the transform of the window (in this peticular case the transform of a gaussian is... a gaussian). So, when the signal is a sine wave, I obtain a gaussian in my spectrum. What I want is the exact continuous frequency of my sine wave. What I have is a frequency sampled gaussian. On a log scale a gaussian is a parabola. So, the quadratic interpolation of this spectrum on a log scale gives an accurate value of the frequency and amplitude. For each frame I store the frequency, phase and amplitude for each peak of the spectrum (ie for every sine wave). Before resynthesis, I select the peaks that correspond to harmonics (thanks to an harmony criterion). On resynthesis, I link the peaks from a frame to the next. I achieve resynthesis by cubic interpolation of the phase(MQ): the synthesized harmonic part sound has the same phases as the original. So, I substract, in the time domain, the harmonic resynthesized part from the original sound, I obtain the difference: ie the noisy part of the sound. By the way, I know that Xavier Serra did an algorithm very similar to this one. But does he substract the harmonic sound in the time domain or in the spectral domain (after his paper I think that it is in the spectral domain). This analysis is very usefull: as it gives the difference between the original signal and the synthetic one, it gives a good idea of the quality of the analysis itself. So, it is possible to adapt many parameters (such as window length) to obtain the best results. This kind of algorithm is very efficient for tuned sounds, it allows you to analyse sounds with strong vibrati and glissandi. It can also be usefull for time stretch and frequency shifts or also for sound editing and morphing (there, you only use the frequency and amplitude information, there can appear distortions for noisy sounds because the phase relationships can't be used). It is far less efficient for noisy sounds such as cymbal sounds. In this case I think that the problem is that the ear has a rather log frequency scale rather than a linear scale (such as the FFT). The solution may be the use of constant Q analysis (such as Dan Ellis's or Toshio Irino's). The problem is that high speed continuous wavelet transforms are still under development... So wait and see ? Is there somebody on the list working on such an algorithm ? >Perfecto Herrera-Boyer wrote: >I am a Doctorate student too and I expect to develop my thesis on >sound synthesis, being directed by Xavier Serra. I am working at the >Institut Universitari de l'Audiovisual (IUA). It is a >research/production center belonging to the Universitat Pompeu Fabra >(see URL: http://www.iua.upf.es for more information). My >academic/professional profile merges Cognitive Science and Sound >Technology, although nowadays I am interested more in the latter than >in the former. >Here at the IUA we are working mainly with SMS (Spectral Modeling >Synthesis), but there is somebody who works with Lemur. Can you describe SMS on the auditory mailing list. Maybe Kelly Fitz (Lemur author) is on the list and can describe us his software. Maybe, somebody at the IRCAM can describe us their SuperPhaseVocoder (SVP) and Audiosculpt(an impressive piece of software). >Maybe, as a starting point, it would be interesting to review or >abstract the discussion held last year, because I am a new member of >HEARING. For those who are interested, I think that you can find an archive of the previous discussions(and paper refs) on the auditory mailing list web pages... If you really want, it may be possible to make an abstract from the previous discussion and to post it to the list. +-------------------------------------------------------------+ | |\ | Thierry Rochebois Doctorant | + | IEF ARCEMA/TSI Bat 220 UPS 91405 ORSAY Cedex | | | thierry(at)ief-paris-sud.fr | | | http://maury.ief-paris-sud.fr:8001/~thierry/welcome.html | | | | | +-------------------------------------------------------------+ | \ \| +-------------------------------------------------------------+


This message came from the mail archive
http://www.auditory.org/postings/1995/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University