Re: a layman's reply speech/music (RealTime )

Subject: Re: a layman's reply speech/music From: RealTime <mandra(at)peabody.jhu.edu> Date: Tue, 31 Mar 1998 14:23:28 -0500 Okay. Now I'm just a humble composer who's done a lot of work w/ computers, but don't you think that we can recognize speech (even of languages we don't know) in the context of other things simply because our brains have evolved a hardwiring of sorts to detect the types of sounds that come out of other peoples mouths. We've been hardwired and trained for our whole lives to respond to human voices. Is there really so much mystery here? Isn't this stuff that is rather obviously related to fletcher - munson curves, etc? I would think that simply as a matter of survival people can identify speech out of sound. Paul Lansky's 'Idle Chatter'pieces do a beautiful job of simulating speech based on rhythm and pitch, even though there's not a single definable word in ANY language coming out of piece. Cheers, chris chris mandra ///// Deus Ex Machina x x Sonitus Ex Sonitus 0 (Your music; is that noise from sound, or sound from noise?) http://www.peabody.jhu.edu/~mandra/ http://soundprint.org On Tue, 31 Mar 1998, BRUNO H. Repp wrote: > Sue Johnson wrote: > > >I'm sure you must be able to detect the presence of speech independent of > >being able to recognise it. If someone spoke to me in Finnish say, I would > >be able to tell they were speaking (even in the presence of background > >music/noise), even though I couldn't even segment the words, never mind > >syntactically or semantically parse them. > >I think there must be some way the brain splits up (deconvolves) the > >signal before applying a speech recogniser. > >(I have no proof of this of course, it's just a gut feeling) > > I am not sure the brain really deconvolves the signal completely. > However, I agree that there must be a bottom-up way of recognizing the > presence of speech in noise or music. One characteristic of speech that > is not shared by music is the presence of smooth and fairly rapid > changes in both fundamental frequency and formant frequencies. This is > quite rare in music, which tends to proceed in stepwise changes. Therefore, > some measure of the rate and/or continuity of spectral change should be > relevant to detecting speech automatically. Another relevant feature is > the amplitude envelope. Speech is organized syllabically and therefore > alternates between periods of high and low amplitude at an average rate > of about 4 Hz. Moreover, this alternation is not strictly periodic and > often interrupted by pauses. Music tends to be more strictly periodic > and has a much wider range of tempi than speech. Therefore, some measure > of the distance and regularity of amplitude peaks in the signal would > also seem to be a relevant measure. > > An interesting problem would be to try to automatically distinguish > song from instrumental music. But perhaps the "easier" problem of separating > music from unrelated speech should be tackled first (though not by me!). > > Bruno H. Repp > Haskins Laboratories > 270 Crown Street > New Haven, CT 06511-6695 > > Phone: (203) 865-6163 (10:00 a.m. - 6:30 p.m.) > FAX: (203) 865-8963 > e-mail: repp(at)haskins.yale edu > WWW: http://www.haskins.yale.edu/Haskins/STAFF/repp.html >

This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University