[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Human Audio Perception : Mini-FAQ



Dear AUDITORY members,

The following was originally posted to the following USENET groups :
comp.dsp,comp.speech,alt.sci.physics.acoustics,sci.psychology,comp.music.

Feel free to e-mail me with comments,corrections and of course more
information.This text is still *VERY* incomplete.

-- Argiris A. Kranidiotis
____________________________________________________________________________

           *****************************************************
           * HUMAN AUDIO PERCEPTION FREQUENTLY ASKED QUESTIONS *
           *****************************************************


                               INTRODUCTION
                               ------------

Well ... all started from 2 questions I posted to USENET.

>From the volume of mail I received , it seems to be an very interesting
a mini-FAQ (Frequently Asked Questions ).

With your help I'll try to make this FAQ as complete as possible .
Please read on to see what other additional information is needed...

The main question remains the same : Given two spectra ( STFFT's Short Time
Fast Fourier Transforms for example ) we try to estimate a psychoacoustic
distance between them (i.e.: a timbral metric).This involves some additional
data:

1) Equal loudness curves (Fletcher-Munson).
   Originally published in J.A.S.A (Journal of the Acoustical Society of
   America) in 1933. Please send to me your data/approximations/formulas.
   Still more information needed on this subject.

2) Bark frequency scale (Critical Bands) . I have found some approximations
   in the range 0..5 KHz . Again more precise information needed.

3) "Masking" effects . Useful information can be found on the MPEG
   Audio compression FAQ (available via anonymous FTP at sunsite.unc.edu
   , IUMA archive) . Also see Bladon and Lindblom , JASA (69) 1981 for a
   formula .

--------------------------------------------------------------------------

-Many thanks to all those kind people who contributed to this text
 (they are too many to list).

-My comments are put in square brackets [ ... ].




                          Argiris A. Kranidiotis

                           University Of Athens
                          Informatics Department
                       akra@zeus.di.uoa.ariadne-t.gr


------------------------------------------------------------------------
       Question #1: How human ear responds to various frequencies ?
------------------------------------------------------------------------


From: Various people
------------------------------------------------------------------------
-Flecher-Munson curves (the most popular answer).

Peak sensitivity at 3,300 Hz , falling off below 40 Hz, and above 10 kHz.

-"An Introduction to the Psychology of Hearing". By Moore , 3d edition.
(the most popular reference).


From: Vincent Pagel <Vincent.Pagel@loria.fr>
------------------------------------------------------------------------

[...]

It's a family of curves [Fletcher Munson curves --AK] a bit like this:


     Db ^|
        ||                            |
        | \                          |
        | |                         |
        |  \                       /
        |   |                     /
        |    \________     ______/
        |             \___/
        |
        |
        |_________________________________________________>  Frequency (Hz)
           400      2500   6000    10000  20000


PERCEPTUALLY all the sounds corresponding to the points on the curve have
the same intensity : this means that the hear have a large range where it
is nearly linear ( 1000 to 8000 Hz ), achieving better result on a little
domain (around 3000 Hz if my memory serves).

[ the curve has a minimum at 3,300 Hz -- AK ]

The rate drops dramatically after 10000 Hz and before 500 Hz ).

You can draw different isosonic curves depending on the first intensity you
begin with ( e.g. if the intensity at 2500Hz is 50 db you get one curve,
but if you start at 2500 Hz with 70 db you get another isosonic curve ....
generally isosonic curves have nearly the same shape and it does not depend
too much on the point it begins at)

To my knowledge there is no mathematic formula given to approximate isosonic
curves, but with the data in the book by Moor it should not be very difficult
to find an approximation.


From: Angelo Campanella <acampane@magnus.acs.ohio-state.edu>
------------------------------------------------------------------------

Obtain the ISO "Zero Phons" standard threshold of human hearing.

-The standard was ISO 389-1975 "Audiometer Standard Reference Zero".
-The US Equivalent is ANSI S3.6 - 1969.

The following numbers apply:

These are dB re 20 micropascals for a sound of pure tone or very narrow
band noise:

--------------------------------------------------------------------------
Audio Frequency        125   250   500  1000  2000  3000 4000  6000 8000
=========================================================================
Human (Monaural)
Threshold of Hearing   45.5  24.5  11    6.5   8.5   7.5  9     8    9.5
Normal young adult
with undisturbed
hearing.  dB re
20 micropascals.


Binaural hearing is 10 to 15 dB better, since the brain has a magnificent
capability to correlate the simultaneous listening of both ears.


From: walkow@compsci.bristol.ac.uk (Tomasz Walkowiak)
------------------------------------------------------------------------
The equal loudness curve can be aproximated by:

E(w)=1.151*SQRT( (w^2+144*10^4)*w^2/((w^2+16*10^4)*(w^2+961*10^4)) )

From: Robinson et al.: Br.J.A.Phys. 7, 166-181, 1956.

This aproximation is for Nyquist frequency equal to 5 kHz, so
w = 2*Pi*f/5kHz   , for 0<f<5kHz. Therefore E(w) is defined for 0<w<Pi.
The E(w) is linear.  And usually is applied to the power spectrum.


-------------------------------------------------------------------------
           Question #2: Phychoacoustic norm / Timbral Metric ?
-------------------------------------------------------------------------

From: Christopher John Rolfe <rolfe@sfu.ca>
-------------------------------------------------------------------------

Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres"
Journal of the Acoustical Soceiety of America, 63, 1493-1500.

Metrics have a long tradition in the literature, beginning
with Fechner in the 19th Century. Cognitive science, however, points
out that perceptual space may be non-Euclidean. In other words, there
is NO simple metric.

Repp, B.H (1984) "Categorical perception: Issues, methods, findings"
In N.J. Lass (ed.) Speech and Language: Advances in Basic
Research and Practice. Vol. 10. 1249-1257.


From: Fahey@psyvax.psy.utexas.edu (Richard Fahey)
--------------------------------------------------------------------------

These curves [Fletcher-Munson again...--AK] may be used to normalise
spectra for loudness at different frequencies (changing dB into phons),
and with a further change into sones one obtains a loudness density plot.

The plot can be made more psychologically real by changing the frequency
scale to the Bark scale, and using an auditory filter to smear the spectrum.

The distance between two spectra represented in ways similar to this can be
calculated as a Euclidean distance, and compared with psychoacoustic data.

From: Vincent Pagel <Vincent.Pagel@loria.fr>
--------------------------------------------------------------------------

[...]

About curves corresponding to the masking effect:

those curves show the minimal intensity a sound with a given frequency
must have to be percieved, when played simultaneously with a sound having
a constant frequency during the experiment ( e.g. let's say that you want
to find out the masking effect of a 500 Hz frequency .... you'll play it
for exemple a 50 db ....and at the same time you'll play another frequency
and you adjust the level of the second frequency to find out the limen
where it is percieved. For example a sound played at 1000 Hz have to be
louder than a sound at 700 Hz, because it's an harmonic of the masking
frequency of 500 Hz ).

---------------------------------------------------------------------------
                            REFERENCES / BOOKS
---------------------------------------------------------------------------


"Loudness: its definition, measurement, and calculation, Journal of the
Acoustical Society of America, 1933, vol 5, p 9.

Author: Fry R.B.  PhD Dissertation, Duke University
Title: Measurement of Specific Sequence Effects in Loudness Perception

Date: 1981
Author: Lane H.L., Catania A.C., Stevens S.S.
Title: Voice Level: Autophonic Scale, Perceived Loudness, and Effects of
Sidetone
Journal: JASA
Volume: 33
Number: 2
Page(s): 160-167
Date: 1961

Author: Peterson G E, McKinney N P
Title: The measurement of speech power
Journal: Phonetica
Volume: 7
Page(s): 65-84
Date: 1961

Author: Schlauch R.S., Wier C.C.
Title: A Method for Relating Loudness-Matching and Intensity-Discrimination
Data
Journal: Journal of Speech and Hearing Research
Volume: 30
Page(s): 13-20
Date: 1987

Author: Small AM, Brandt JF, Cox PG
Title: [...?] function of signal duration
Journal: JASA
Volume: 34
Page(s): 513-514
Date: 1962

Author: Stevens S.S.
Title: Calculation of the Loudness of Complex Noise
Journal: JASA
Volume: 28
Number: 5
Page(s): 807-832
Date: 1956

A.S.Bregman, Auditory Scene Analysis, MIT Press, 1990
Stephen Handel, Listening, [sorry, no citation]

Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres"
Journal of the Acoustical Soceiety of America, 63, 1493-1500.

Repp, B.H (1984) "Categorical perception: Issues, methods, findings"
In N.J. Lass (ed.) Speech and Language: Advances in Basic
Research and Practice. Vol. 10. 1249-1257.

Moore and Glasberg, JASA 74(3) 1983.

Bladon and Lindblom, JASA 69(5) 1981.

J. R. Pierce, The Science of Musical Sound (Freenam, New York, 1983).

J. G. Roederer, Introduction to the Physics and Psychophysics of Music
(Springer-Verlag, New York, 1975).

S. S. Stevens, "Measurement of Loudness", JASA 27 (1955): 815

S. S. Stevens, "Neural Events ans Psyhcophysical Law", _Science 170_
(1970): 1043

E. Zwicker, G. Flottorp, and S. S. Stevens, "Critical Bandwidth in Loudness
Summation",  JASA 29 (1957): 548

Author:Hynek Hermansky
Institution:Speech Technology Laboratory, Division of Panasonic
Technologies, Inc., 3888 State Street, Santa Barbara, CA 93105, USA
Title:Perceptual linear predictive ({PLP}) analysis of speech},
Journal: JASA
Year:1990
Vol.87 ,Number 4 , Page(s):1738-1752

Gersho et al (Bark Spectral Distance).
IEEE Journal Selected areas of Communications Sept. (?) 1992


Name:    "An Introduction to the Physiology of Hearing"
Author:  James O. Pickles,Dept. of Physiology,Uni. Birmingham,England.
Publisher: Academic Press,1982.
ISBN 0-12-554750-1 (hardback)
ISBN 0-12-554752-8 (paperback).


"An introduction to the psychology of hearing" by B. MOORE , 3d Edition.


--
      ____________________________      __________________________________
     /                           /\    /                                 /\
    /   Argiris A. Kranidiotis _/ /\  /       E-mail (Internet):       _/ /\
   /  University Of Athens    / \/   /                                / \/
  / Informatics Department    /\    /  akra@zeus.di.uoa.ariadne-t.gr  /\
 /___________________________/ /   /_________________________________/ /
 \___________________________\/    \_________________________________\/
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \     \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \