[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition



Hi,

At the top of the following page:
http://recherche.ircam.fr/anasyn/degottex/index.php?n=Main.ExDegottex2010Appendix
Please find three sounds with same amplitude spectrums but different phase 
spectrums. Personnaly, I notice a difference.

These examples are made of synthetic vowels generated with the same 
vocal-tract filter amplitude spectrums but changing between the 
minimum-phase, zero-phase or maximum-phase properties of the filter.
In the same area, you can also find very interesting works about the 
group-delay and its perception in [1].
In my own opinion, these properties seem not relevant for intelligibility of 
speech but more crucial for voice-quality manipulation like in voice 
transformation and speech synthesis.

Whereas the above examples concerns only phase properties of filters, it would 
be great to improve the terminology about "phase" to better know the 
concerned subject (as mentionned in this thread), to understand where this 
wonderfull well accepted a priori "phase insensitivity " extactelly comes 
from and improve our knowledge about that subject.

Regards,
Gilles

[1] Banno, Hideki and Takeda, Kazuya and Itakura, Fumitada, "The effect of 
group delay spectrum on timbre", Acoustical Science and Technology, vol. 23, 
no. 1, pp. 1--9, 2002.

On Wednesday 06 October 2010 15:08, Bob Masta wrote:
> On 5 Oct 2010 at 19:03, emad burke wrote:
> >  I'm exactly talking about what is called
> > "in-sensitivity to phase". I'm talking about the phase information that
> > is discarded in the process of MFCC feature extraction and it has been
> > proven to be succesfull feature set for speech recognition. The
> > "insensitivity to phase" that implicitly implies that if you change the
> > order (precedence) of travelling waves in each cochlear channel among
> > each other, it will not affect the perception and you can add random
> > phases to different channels without affecting the perception(?).
>
> One classical way to demonstrate this insensitivity is to
> build up a wave from several component frequencies, and
> listen to the sum.  Then change only the relative phases
> and see if you can detect a difference.  It turns out that
> you can't, most of the time.  (This assumes that you turn
> the sound off while you are making the changes... it is
> easy to hear dynamic changes.)
>
> You do need to use a bit of caution:  Different phase
> relations can cause large differences in waveform peak
> heights, and the larger peaks can produce distortion due to
> nonlinearities in the speaker, the ear, or even the air
> itself. So you might hear a difference that isn't really
> due to phase as such, just added components due to
> distortion.  But this is not a problem for "reasonable"
> listening levels.
>
> You can use my Daqarta software to demonstrate the
> insensitivity for yourself with any Windows system.
> Click the Generator button to get a default 440 Hz sine,
> and adjust the controls for a comfortable level.
>
> In the Generator dialog, click on the left Waveform
> Controls button (midway down the dialog) and you will get
> the control dialog for the left Stream 0.  (There are four
> streams per channel, labeled 0-3, which are summed together
> by default.)
>
> Set the Level for Stream 0 to (say) 50%, since the total
> for all streams must be no more than 100%.  (If you want to
> use four equal-amplitude components, then set each to 25%.
> Here I assume you will set up the first four components of
> a square wave.)
>
> Now at the top of this dialog click on '1' to change to
> Stream 1, and set its Tone Freq to 3 * 440 = 1320. Set its
> Level to 1/3 * 50 = 16.67%.  Now toggle Stream On near the
> top of the dialog to add it to the output.
>
> Repeat as needed for Streams 2 and 3.
>
> At this point all components are in phase.  To set any
> component to an arbitrary phase, click on its Tone Freq
> button to bring up the control dialog, and adjust Main
> Phase as desired.
>
> Please let me know if there are any questions of problems.
>
> Best regards,
>
> Bob Masta
>
>             D A Q A R T A
> Data AcQuisition And Real-Time Analysis
>            www.daqarta.com
> Scope, Spectrum, Spectrogram, Signal Generator
>     Science with your sound card!

-- 
IRCAM - CNRS-UMR9912-STMS, 1 place Stravinsky, 75004 Paris
Phone: 0033 6 37 51 98 31; Work: 0033 1 44 78 48 62
Fax: 0033 1 44 78 15 40 Team: www.ircam.fr/anasyn
Homepage: recherche.ircam.fr/anasyn/degottex