[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MDS-distances
Dear List,
I think Malcolm is on the right track with his idea of an auditory
version of three-color vision. I think it can be done. The reason is
that I've encountered this sort of perception while doing experiments
in classifying timbre of pulsed sequences. I noticed this effect when
listening to groups of all possible combinations of sequences of from
1 to 4 successive impulse waveshapes. The impulse shapes I used were
either rectangular or overdamped sines with constant amplitude and
with ordered time spacings. I found that equal spacings give a tonal
timbre (or pitch) while aperiodic spacings give an atonal timbre. I
noticed that I heard atonal timbre in terms of the vowels (/ee/,
/ih/, /eh/,...../oo/.) By a series of tests with different
combinations of shapes and intervals I could locate the combinations
that produced the closest match to vowel centers. I found that the
non-matching timbres would generally lie between adjacent centers.
For example, if a sample were not an /ah/ it would fall between
either /aa/ or /aw/, or it might move toward the back vowel /oo/ as
in an umlaut.
These results led me to suppose that a set of vowels could represent
cardinal points in a timbre space that applies to human speech as
well as to environmental sounds. (Why not? The vowels of the human
vocal cavity can be heard in a wide variety of non-speech sounds.)
This indicates that a vowel space could be defined perhaps in terms
of one like the "RGB" space of color TV. For example, consider an
"FMB" space with F for front-closed, M for middle-open, and B for
back-closed. Couldn't this space include all of the variants of vowel
sounds? Couldn't this also provide a more quantitative calibration of
timbre than one based on things like spectral brightness and "bite?"
If anyone feels like trying these tests, the key is to listen to the
atonal waveshape groups either individually or in a stream, each one
separated by more than 25 milliseconds. The 25 ms.separation reduces
the possibility of mixing the sound of group repetition with group
timbre. Note that the group of the vowel, /ee/, as the only tonal
vowel, must contain at least three equally spaced impulses having a
repetition rate that defines the third formant. Note also that
whispered speech, or all kinds of environmental sounds including
transients could be simulated by randomizing the separation and/or
mixing of timbre and pitch.
IAn example of the dichotomy of timbre and pitch can be shown in the
experiment by E. Terhardt and H. Fastl, "Zum einfluss von stortonen
un storgerauschen auf die tonhohe von sinustonen" Acoustica, vol. 25,
pp53-61, 1971 This is a study of phase masking vs.amplitude where a
200 Hz tone masks a 400 Hz tone as its phase is varied in steps 0 to
360 degrees. I've been testing this experiment and have found that
the timbre varies with corresponding waveshape changes although the
two pitches are constant. My experiments are based on Manfred
Schroeder's description in "Models of hearing," Proceedings of the
IEEE, Vol. 63, No.9, September,1974. I'm looking for more information
on how the experiment was run, since Schroeder's paper was only a
summary. So far I haven't found much about it on the Internet.
John Bates
snip-
Failed at what? Malcolm, I think you have missed the point.
Fair enough.. We have different goals. I want a model of timbre
perception (for speech and music sounds) that rivals the three-color
model of color vision science. Spectral brightness and attack time
are not enough of an answer for me.
I don't think the timbre interpolation work I've seen (the
vibrabone?) shows that we understand timbre space yet. As I
remember the data, the synthesized instrument was not on a
perceptual line directly between the source sounds.