[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
mp3 and the perceptual coding (resume)
Dear all,
I summarized (i.e., cut/paste) all the replies I had below. But I'm not
going to tell who's the winner!
Thanks to everybody,
m
-----------------------------
--------- QUESTION ----------
-----------------------------
I would like to convince to my students that psychophysics can (seldom)
be useful. For this reason, I want to talk about mp3 and perceptual coding.
Is there a book/chapter/paper that you would particularly recommend
about it? There are many things out there, but so far I didn't like much
any of them. In particular, the characteristics of the perceptual coding
are always roughly described.
Any suggestion is more than welcome.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%% REPLIES %%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Laszlo Toth:
I usually do the following demonstration: take a speech sample, compress
it with an mp3 codec at the highest possible compression rate, then
decompress it and display the spectrogram of the original and the
processed signal. The spectral valleys are wiped out,
while there is minimal perceptual difference. I think this quite
convincingly demonstrates that masking indeed works, and that the industry
can make use of the results of psychophysics.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hi,
you can have a look at:
http://www.mp3-tech.org/
and then 'Technical papers' which not only contains a collection of
papers but also theses covering basic psychoacoustics. I also like the
book "Audio Signal Processing and Coding" by Spanias.
Hope this helps!
Matthias
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Bosi & Goldberg book is a good resource for beginners:
http://www.springer.com/engineering/signals/book/978-1-4020-7357-1
For a somewhat more concise tutorial, see the 1995 paper by Davis Pan:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.232&rep=rep1&type=pdf
Also, the CD published by the AES is a great resource to learn about
the basic audio artifacts:
http://www.aes.org/publications/AudioCoding.cfm
If you have specific questions/needs, let me know and I'll do my best
to point you in the right direction.
- Jon
Jon Boley
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hello Massimo :)
A very good review paper is by T. Painter et al.
(http://www.mp3-tech.org/programmer/docs/audiopaper1.pdf)
:) stefan
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
By the way, there are also a couple Matlab implementations of the two
psychoacoustic models specified in the standard:
Model 1: http://www.petitcolas.net/fabien/software/mpeg/
Model 2: http://perceptualentropy.com/Model2.zip
I wrote the Model 2 code while working on my Masters. It doesn't
quite follow the standard (since I had trouble getting it to work),
but it should be good enough for demo purposes.
Ken Pohlmann uses figures from both of these models in his book:
http://www.amazon.com/Principles-Digital-Audio-Ken-Pohlmann/dp/0071441565
- Jon
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I would suggest "Perceptual coding of digital audio" by Painter and
Spanias in Proc. IEEE (you can find it easily with google).
Buona Primavera a voi!
Mark Kahrs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dear Massimo,
I used a book by Zwicker and Fastl a while ago. It did address some
perceptual issues very nicely. Also, you should talk to Juergen Herre. I
forget which German university he is with but he is extremely
knowledgeable on this. The people at the Fraunhofer Institute should
also be very helpful.
Regards,
ajay
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dear Massimo,
There is a brief discussion of this in the last chapter of "An
introduction to the psychology of hearing". It should be suitable for
psychologists, but you may find it to be over-simplified.
Best wishes,
Brian
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Houtsma, Adrian (2008). Perceptually based audio coding. Chapter 42 in
Handbook of Signal Processing in Acoustics, Edited by David Havelock, Sonoko
Kuwano and Michael Vorlander, Springer, New York.
(Search on: adrian houtsma perceptual coding)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dear Massimo,
I faced the same problem a few years ago while teaching signal
processing and psychoacoustics to psychology students.
I read the "psychoacoustic" part of the MP3 description, and found it
quite non-psychoacoustic. So I produced a demo that illustrates the
effect visually and auditorily. Simply take a notched noise with a pure
tone, like is used to measure auditory filters, and change the width of
the notch. For each sound, encode in MP3 with very low quality, and
you'll see what happens to the notch.
I've attached a slide with this demo. I think it illustrates the fact
that the MP3 encoder uses the concept of auditory filters, and when you
listen to the encoded version, it also illustrates that it doesn't do a
very good job... Feel free to re-use these slides if you like.
You can also produce other demos on the same concept with other coders
(ogg, etc...).
Hope that helps.
-Etienne
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Well, I'm sorry that Engineers rate so poorly, but you might try the
following (written by this Engineer, I fear), one of the pioneers in the
audio perceptual coding field, perhaps:
Johnston, James D, "Perceptual Audio Coding - A History and
Timeline",41st Asilomar Conference on Signals, Systems and Computers, 2007.
Jayant, N. S., Johnston, J. D. and Safranek, R. J., "Signal compression
based on models of human perception," Proc. IEEE, Oct. 1993, pp. 1385-1422.
Brandenburg, K., Herre, J., Johnston, J. D., Mahieux, Y. and Schroeder,
E. F., "ASPEC: Adaptive spectral entropy coding of high quality music
signals," 90th Convention of the AES, Feb. 1991, Preprint 3011 A-4.
Brandenburg, K., Stoll, G., Johnston, J. D. and et al, "Coding of moving
pictures and associated audio for digital storage media at up to about
1.5 mb./s audio," ISO/IEC JTC1/SC29/WG11 MPEG: International Standard
ISO 11172-3, 1991.
Johnston, J. D. and Brandenburg, K., "Wideband coding - Perceptual
considerations for speech and music" in Advances in Speech Signal
Processing, Furui and Sondhi (Ed.), Marcel Dekker, 1991, Preprint 3011 A-4.
Brandenburg, K. and Johnston, J. D., "Second generation perceptual audio
coding: The hybrid coder," AES 88th Conv. Preprint, March 1990.
Safranek, R. J., Johnston, J. D. and Rosenholtz, R. E., "A perceptually
tuned sub-band image coder," Proc. SPIE Symp. Human Vision & Electronic
Imaging: Models, Methods & Applications, Santa Clara, CA, Feb. 1990.
Johnston, J. D., "Digital audio - Future trends in quantization,
storage, and compression," AES 7th Intn'l. Conf. Audio in Digital Times,
May 1989.
Johnston, J. D., "Perceptual transform coding of wideband stereo
signals," ICASSP '89, May 1989, pp. 1993-1996.
Safranek, R. J. and Johnston, J. D., "A perceptually tuned sub-band
image coder with image dependant quantization and post quantization data
compression," ICASSP '89, May 1989, pp. 1945-1948.
Johnston, J. D., "Transform coding of audio signals using perceptual
noise criteria," IEEE Jour. Selected Areas in Commun., vol. 6, no. 2,
Feb 1988, pp. 314-323.
Johnston, J. D., "Estimation of perceptual entropy using noise masking
criteria," ICASSP '88 Record, 1988, pp. 2524-2527.
Cox, R. V., Bock, D. E., Bauer, K. B., Johnston, J. D. and Snyder, J.
H., "The analog voice privacy system," AT&T Tech. Jour., vol. 66, no. 1,
Jan-Feb 1987, pp. 119-131.
Or look at
http://home.comcast.net/~retired_old_jj/bibliography.html
James D. Johnston
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I would recommend "Applications of DSP to audio and acoustics". Best, Ayce
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
A related demo is to subtract the spectrum of the mp3 coded sound from
the original, and do an inverse fft, to show all the sounds that they
are not "hearing." Werner Deutsch used to do this with symphonic
recordings and joke that this showed that most of the musicians in the
symphony were not needed.
Brian Gygi, Ph.D.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hi Massimo,
For more specific questions on mp3 (and other lossles (and lossy) audio
formats) I recommend hydrogenaudio.org forums:
http://www.hydrogenaudio.org/forums/index.php?showforum=55
Cheers!
Danijel Domazet
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dear Massimo Grassi,
As Matthias said, you can have a look at the book "Audio Signal
Processing and Coding" by Spanias. In chapter 5, there's an outline
about psychoacoustics and its application to
mp3. This application is in section 5.7 (EXAMPLE CODEC PERCEPTUAL MODEL:
ISO/IEC 11172-3 (MPEG - 1) PSYCHOACOUSTIC MODEL 1).
There are many tutorials about mp3, but one very easy to read is Davis
Pan's "A tutorial on MPEG/Audio Compression". It also shows graphically
the advantage of the mp3 technique.
Please note that MP3 takes advantage of the spectral and temporal
masking of the input
data. The former is used for reducing the output bit rate. Temporal
masking is used for minimizing the error introduced by the filter bank.
For further questions, please do not hesitate in asking me.
Regards,
Fernando
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dear Massimo, all,
I would like to add the following websites to the range of recommended
places to look:
http://www.iis.fraunhofer.de/bf/amm/products/mp3/
http://www.all4mp3.com/
They contain, of course, only consumer-level of detail, but they might
be useful nonetheless.
As a student I personally found the presentation of the "13db miracle"
the most impressive and straightforward demonstration of psychoacoustics
in audio coding. Of course that would require your students to have an
understanding of the principle of SNR.
Regards,
Max Neuendorf
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%