[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: a layman's reply speech/music



Okay. Now I'm just a humble composer who's done a lot of work w/
computers, but don't you think that we can recognize speech (even of
languages we don't know) in the context of other things simply because our
brains have evolved a hardwiring of sorts to detect the types of sounds
that come out of other peoples mouths. We've been hardwired and trained
for our whole lives to respond to human voices. Is there really so much
mystery here? Isn't this stuff that is rather obviously related to
fletcher - munson curves, etc? I would think that simply as a matter of
survival people can identify speech out of sound. Paul Lansky's 'Idle
Chatter'pieces do a beautiful job of simulating speech based on rhythm and
pitch, even though there's not a single definable word in ANY language
coming out of piece.

Cheers,
chris

        chris mandra                                      /////

        Deus Ex Machina                                    x x

        Sonitus Ex Sonitus                                  0

        (Your music; is that noise from sound, or sound from noise?)

        http://www.peabody.jhu.edu/~mandra/        http://soundprint.org



On Tue, 31 Mar 1998, BRUNO H. Repp wrote:

> Sue Johnson wrote:
>
> >I'm sure you must be able to detect the presence of speech independent of
> >being able to recognise it. If someone spoke to me in Finnish say, I would
> >be able to tell they were speaking (even in the presence of background
> >music/noise), even though I couldn't even segment the words, never mind
> >syntactically or semantically parse them.
> >I think there must be some way the brain splits up (deconvolves) the
> >signal before applying a speech recogniser.
> >(I have no proof of this of course, it's just a gut feeling)
>
>         I am not sure the brain really deconvolves the signal completely.
> However, I agree that there must be a bottom-up way of recognizing the
> presence of speech in noise or music. One characteristic of speech that
> is not shared by music is the presence of smooth and fairly rapid
> changes in both fundamental frequency and formant frequencies. This is
> quite rare in music, which tends to proceed in stepwise changes. Therefore,
> some measure of the rate and/or continuity of spectral change should be
> relevant to detecting speech automatically. Another relevant feature is
> the amplitude envelope. Speech is organized syllabically and therefore
> alternates between periods of high and low amplitude at an average rate
> of about 4 Hz. Moreover, this alternation is not strictly periodic and
> often interrupted by pauses. Music tends to be more strictly periodic
> and has a much wider range of tempi than speech. Therefore, some measure
> of the distance and regularity of amplitude peaks in the signal would
> also seem to be a relevant measure.
>
>         An interesting problem would be to try to automatically distinguish
> song from instrumental music. But perhaps the "easier" problem of separating
> music from unrelated speech should be tackled first (though not by me!).
>
> Bruno H. Repp
> Haskins Laboratories
> 270 Crown Street
> New Haven, CT 06511-6695
>
> Phone:   (203) 865-6163 (10:00 a.m. - 6:30 p.m.)
> FAX:     (203) 865-8963
> e-mail:  repp@haskins.yale edu
> WWW:     http://www.haskins.yale.edu/Haskins/STAFF/repp.html
>