Subject: Human Audio Perception : Mini-FAQ From: Argiris Kranidiotis <akra(at)URANUS.DI.UOA.ARIADNE-T.GR> Date: Thu, 26 May 1994 20:28:54 +0300Dear AUDITORY members, The following was originally posted to the following USENET groups : comp.dsp,comp.speech,alt.sci.physics.acoustics,sci.psychology,comp.music. Feel free to e-mail me with comments,corrections and of course more information.This text is still *VERY* incomplete. -- Argiris A. Kranidiotis ____________________________________________________________________________ ***************************************************** * HUMAN AUDIO PERCEPTION FREQUENTLY ASKED QUESTIONS * ***************************************************** INTRODUCTION ------------ Well ... all started from 2 questions I posted to USENET. >From the volume of mail I received , it seems to be an very interesting a mini-FAQ (Frequently Asked Questions ). With your help I'll try to make this FAQ as complete as possible . Please read on to see what other additional information is needed... The main question remains the same : Given two spectra ( STFFT's Short Time Fast Fourier Transforms for example ) we try to estimate a psychoacoustic distance between them (i.e.: a timbral metric).This involves some additional data: 1) Equal loudness curves (Fletcher-Munson). Originally published in J.A.S.A (Journal of the Acoustical Society of America) in 1933. Please send to me your data/approximations/formulas. Still more information needed on this subject. 2) Bark frequency scale (Critical Bands) . I have found some approximations in the range 0..5 KHz . Again more precise information needed. 3) "Masking" effects . Useful information can be found on the MPEG Audio compression FAQ (available via anonymous FTP at sunsite.unc.edu , IUMA archive) . Also see Bladon and Lindblom , JASA (69) 1981 for a formula . -------------------------------------------------------------------------- -Many thanks to all those kind people who contributed to this text (they are too many to list). -My comments are put in square brackets [ ... ]. Argiris A. Kranidiotis University Of Athens Informatics Department akra(at)zeus.di.uoa.ariadne-t.gr ------------------------------------------------------------------------ Question #1: How human ear responds to various frequencies ? ------------------------------------------------------------------------ From: Various people ------------------------------------------------------------------------ -Flecher-Munson curves (the most popular answer). Peak sensitivity at 3,300 Hz , falling off below 40 Hz, and above 10 kHz. -"An Introduction to the Psychology of Hearing". By Moore , 3d edition. (the most popular reference). From: Vincent Pagel <Vincent.Pagel(at)loria.fr> ------------------------------------------------------------------------ [...] It's a family of curves [Fletcher Munson curves --AK] a bit like this: Db ^| || | | \ | | | | | \ / | | / | \________ ______/ | \___/ | | |_________________________________________________> Frequency (Hz) 400 2500 6000 10000 20000 PERCEPTUALLY all the sounds corresponding to the points on the curve have the same intensity : this means that the hear have a large range where it is nearly linear ( 1000 to 8000 Hz ), achieving better result on a little domain (around 3000 Hz if my memory serves). [ the curve has a minimum at 3,300 Hz -- AK ] The rate drops dramatically after 10000 Hz and before 500 Hz ). You can draw different isosonic curves depending on the first intensity you begin with ( e.g. if the intensity at 2500Hz is 50 db you get one curve, but if you start at 2500 Hz with 70 db you get another isosonic curve .... generally isosonic curves have nearly the same shape and it does not depend too much on the point it begins at) To my knowledge there is no mathematic formula given to approximate isosonic curves, but with the data in the book by Moor it should not be very difficult to find an approximation. From: Angelo Campanella <acampane(at)magnus.acs.ohio-state.edu> ------------------------------------------------------------------------ Obtain the ISO "Zero Phons" standard threshold of human hearing. -The standard was ISO 389-1975 "Audiometer Standard Reference Zero". -The US Equivalent is ANSI S3.6 - 1969. The following numbers apply: These are dB re 20 micropascals for a sound of pure tone or very narrow band noise: -------------------------------------------------------------------------- Audio Frequency 125 250 500 1000 2000 3000 4000 6000 8000 ========================================================================= Human (Monaural) Threshold of Hearing 45.5 24.5 11 6.5 8.5 7.5 9 8 9.5 Normal young adult with undisturbed hearing. dB re 20 micropascals. Binaural hearing is 10 to 15 dB better, since the brain has a magnificent capability to correlate the simultaneous listening of both ears. From: walkow(at)compsci.bristol.ac.uk (Tomasz Walkowiak) ------------------------------------------------------------------------ The equal loudness curve can be aproximated by: E(w)=1.151*SQRT( (w^2+144*10^4)*w^2/((w^2+16*10^4)*(w^2+961*10^4)) ) From: Robinson et al.: Br.J.A.Phys. 7, 166-181, 1956. This aproximation is for Nyquist frequency equal to 5 kHz, so w = 2*Pi*f/5kHz , for 0<f<5kHz. Therefore E(w) is defined for 0<w<Pi. The E(w) is linear. And usually is applied to the power spectrum. ------------------------------------------------------------------------- Question #2: Phychoacoustic norm / Timbral Metric ? ------------------------------------------------------------------------- From: Christopher John Rolfe <rolfe(at)sfu.ca> ------------------------------------------------------------------------- Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres" Journal of the Acoustical Soceiety of America, 63, 1493-1500. Metrics have a long tradition in the literature, beginning with Fechner in the 19th Century. Cognitive science, however, points out that perceptual space may be non-Euclidean. In other words, there is NO simple metric. Repp, B.H (1984) "Categorical perception: Issues, methods, findings" In N.J. Lass (ed.) Speech and Language: Advances in Basic Research and Practice. Vol. 10. 1249-1257. From: Fahey(at)psyvax.psy.utexas.edu (Richard Fahey) -------------------------------------------------------------------------- These curves [Fletcher-Munson again...--AK] may be used to normalise spectra for loudness at different frequencies (changing dB into phons), and with a further change into sones one obtains a loudness density plot. The plot can be made more psychologically real by changing the frequency scale to the Bark scale, and using an auditory filter to smear the spectrum. The distance between two spectra represented in ways similar to this can be calculated as a Euclidean distance, and compared with psychoacoustic data. From: Vincent Pagel <Vincent.Pagel(at)loria.fr> -------------------------------------------------------------------------- [...] About curves corresponding to the masking effect: those curves show the minimal intensity a sound with a given frequency must have to be percieved, when played simultaneously with a sound having a constant frequency during the experiment ( e.g. let's say that you want to find out the masking effect of a 500 Hz frequency .... you'll play it for exemple a 50 db ....and at the same time you'll play another frequency and you adjust the level of the second frequency to find out the limen where it is percieved. For example a sound played at 1000 Hz have to be louder than a sound at 700 Hz, because it's an harmonic of the masking frequency of 500 Hz ). --------------------------------------------------------------------------- REFERENCES / BOOKS --------------------------------------------------------------------------- "Loudness: its definition, measurement, and calculation, Journal of the Acoustical Society of America, 1933, vol 5, p 9. Author: Fry R.B. PhD Dissertation, Duke University Title: Measurement of Specific Sequence Effects in Loudness Perception Date: 1981 Author: Lane H.L., Catania A.C., Stevens S.S. Title: Voice Level: Autophonic Scale, Perceived Loudness, and Effects of Sidetone Journal: JASA Volume: 33 Number: 2 Page(s): 160-167 Date: 1961 Author: Peterson G E, McKinney N P Title: The measurement of speech power Journal: Phonetica Volume: 7 Page(s): 65-84 Date: 1961 Author: Schlauch R.S., Wier C.C. Title: A Method for Relating Loudness-Matching and Intensity-Discrimination Data Journal: Journal of Speech and Hearing Research Volume: 30 Page(s): 13-20 Date: 1987 Author: Small AM, Brandt JF, Cox PG Title: [...?] function of signal duration Journal: JASA Volume: 34 Page(s): 513-514 Date: 1962 Author: Stevens S.S. Title: Calculation of the Loudness of Complex Noise Journal: JASA Volume: 28 Number: 5 Page(s): 807-832 Date: 1956 A.S.Bregman, Auditory Scene Analysis, MIT Press, 1990 Stephen Handel, Listening, [sorry, no citation] Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres" Journal of the Acoustical Soceiety of America, 63, 1493-1500. Repp, B.H (1984) "Categorical perception: Issues, methods, findings" In N.J. Lass (ed.) Speech and Language: Advances in Basic Research and Practice. Vol. 10. 1249-1257. Moore and Glasberg, JASA 74(3) 1983. Bladon and Lindblom, JASA 69(5) 1981. J. R. Pierce, The Science of Musical Sound (Freenam, New York, 1983). J. G. Roederer, Introduction to the Physics and Psychophysics of Music (Springer-Verlag, New York, 1975). S. S. Stevens, "Measurement of Loudness", JASA 27 (1955): 815 S. S. Stevens, "Neural Events ans Psyhcophysical Law", _Science 170_ (1970): 1043 E. Zwicker, G. Flottorp, and S. S. Stevens, "Critical Bandwidth in Loudness Summation", JASA 29 (1957): 548 Author:Hynek Hermansky Institution:Speech Technology Laboratory, Division of Panasonic Technologies, Inc., 3888 State Street, Santa Barbara, CA 93105, USA Title:Perceptual linear predictive ({PLP}) analysis of speech}, Journal: JASA Year:1990 Vol.87 ,Number 4 , Page(s):1738-1752 Gersho et al (Bark Spectral Distance). IEEE Journal Selected areas of Communications Sept. (?) 1992 Name: "An Introduction to the Physiology of Hearing" Author: James O. Pickles,Dept. of Physiology,Uni. Birmingham,England. Publisher: Academic Press,1982. ISBN 0-12-554750-1 (hardback) ISBN 0-12-554752-8 (paperback). "An introduction to the psychology of hearing" by B. MOORE , 3d Edition. -- ____________________________ __________________________________ / /\ / /\ / Argiris A. Kranidiotis _/ /\ / E-mail (Internet): _/ /\ / University Of Athens / \/ / / \/ / Informatics Department /\ / akra(at)zeus.di.uoa.ariadne-t.gr /\ /___________________________/ / /_________________________________/ / \___________________________\/ \_________________________________\/ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \