Dear Daniel & Alexander:
Thanks a lot for nice opinions, and apologies for my slow
reaction. It was really nice and helpful.
In the meantime, I have more questions for the
list:
I was wondering if it is possible to predict the “overall level of
the masking threshold” based on some statistics of the input signal (e.g. power,
power spectral density, etc). By “overall level of the masking threshold”, I
mean the average of the masking threshold over frequency.
For example, the threshold in quiet has an overall level of ~ 32 dB
It would be great to have some kind of relation between the overall level of the masking threshold with the level of the input signal and the level of the threshold in quiet.
Any suggestions and/or comments in this regard will be highly
appreciated.
Thanks and regards,
Hi Arijit,
Try avoiding MPEG psychoacoustic model 2 - I think
it is too complex.
There are a few things that are important when
desingning SIMPLE psycho
model:
- try to avoid separate time/freq.
transformation (usualy FFT). Use the
result of the one that is already
present in the encoder (MDCT most likely).
It isn't as good but spares you a
FFT computation. Results are more than
acceptable.
- don't define
separeate critical bands like in Psycho 2 (that better fit
human hearing),
use the ones defined in your encoder as scalefactor bands,
it will be much
simpler.
- tonality estimation might also be unnecessary. Just assume the
constant
masking for tonal and non-tonal singnals, it will do the job for
most
signals (you might loose some quality for strong tonal samples but it
might
not be too critical).
- if you have to include tonality
detection - don't calculate it based on
prediction accross frames, lookahead
buffers will increase the delay and
complexity also. MPEG psycho model 2 has
some really unnecessary lookaheads.
Use some other method for tonality
estimation (Spectral Flatness Measure for
example).
- don't complicate
with the spreading function, simple triangular function
will do the
job.
- detect transients in TIME domain.
- estimate scalefactors
directly from masking threasholds, don't use
inner-and-outter loop method
like Psycho 2 recommends (many iterations slow
you down
drastically).
What I would do is somehting like:
- calculate
time/freq transformation
- calculate energy accros sritical bands
-
calculate masking (or use constant)
- calculate masking threshold as energy *
masking
- apply spreading function
- apply threashold in quiet (this will
give you the main result of the
psycho analysis - the masking
threashold)
- convert masking thresholds directly to scalefactors
If your
quantized spectar doesn't fit the bitrate, just increment ALL
scalefacotors
at the same time and repeat the quantization.
I hope this helped. It
you don't understand all this now, don't worry - you
will when you get
involved with psychoacoustics some more.
Also, take a look at the
psychoacoustic model of the Enhanced aacPlus
general audio codec from 3GPP -
TS 26.403.
Regards,
Daniel
----- Original Message
-----
From: "alexander lerch" <lerch@xxxxxxxxx>
To:
<AUDITORY@xxxxxxxxxxxxxxx>
Sent: Wednesday, February 08, 2006 1:49
PM
Subject: Re: [AUDITORY] computational complexity of psychoacoustic
models
The choice is, at least for all MPEG codecs, completely up to
the
developer. You can decide not to use a psychoacoustic model at all,
or
you can decide to use a complex model to gain as much quality as
possible.
Oftenly used steps are:
FFT
Critical Band
grouping
Conversion to dB
(Analysis of tonality of possible
maskers)
calculation of masking threshold via masking model
Have a
look at the psychoacoustic model 2 in the informative part of the
MPEG-1
standard.
Kind regards,
Alexander
#ARIJIT BISWAS#
wrote:
> Hi List:
>
>
>
> I’m interested to know
the computational complexity (number of additions
> and multiplications)
of psychoacoustic models used in audio coding.
>
> Well, to be more
specific, let’s say if I’m targeting to build a “fast”
> psychoacoustic
model, which existing model and/or what kind of
> computational complexity
should I try to beat?
>
>
>
> Any
help/suggestions/references in this direction will be highly
>
appreciated.
>
>
>
> Best Regards,
>
>
~Arijit
>
--
dipl. ing.
alexander
lerch
zplane.development
:www.zplane.de
katzbachstr.21
d-10965
berlin
fon: +49.30.854 09 15.0
fax: +49.30.854 09
15.5