[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computational complexity of psychoacoustic models
Hi Arijit,
Try avoiding MPEG psychoacoustic model 2 - I think it is too complex.
There are a few things that are important when desingning SIMPLE psycho
model:
- try to avoid separate time/freq. transformation (usualy FFT). Use the
result of the one that is already present in the encoder (MDCT most likely).
It isn't as good but spares you a FFT computation. Results are more than
acceptable.
- don't define separeate critical bands like in Psycho 2 (that better fit
human hearing), use the ones defined in your encoder as scalefactor bands,
it will be much simpler.
- tonality estimation might also be unnecessary. Just assume the constant
masking for tonal and non-tonal singnals, it will do the job for most
signals (you might loose some quality for strong tonal samples but it might
not be too critical).
- if you have to include tonality detection - don't calculate it based on
prediction accross frames, lookahead buffers will increase the delay and
complexity also. MPEG psycho model 2 has some really unnecessary lookaheads.
Use some other method for tonality estimation (Spectral Flatness Measure for
example).
- don't complicate with the spreading function, simple triangular function
will do the job.
- detect transients in TIME domain.
- estimate scalefactors directly from masking threasholds, don't use
inner-and-outter loop method like Psycho 2 recommends (many iterations slow
you down drastically).
What I would do is somehting like:
- calculate time/freq transformation
- calculate energy accros sritical bands
- calculate masking (or use constant)
- calculate masking threshold as energy * masking
- apply spreading function
- apply threashold in quiet (this will give you the main result of the
psycho analysis - the masking threashold)
- convert masking thresholds directly to scalefactors
If your quantized spectar doesn't fit the bitrate, just increment ALL
scalefacotors at the same time and repeat the quantization.
I hope this helped. It you don't understand all this now, don't worry - you
will when you get involved with psychoacoustics some more.
Also, take a look at the psychoacoustic model of the Enhanced aacPlus
general audio codec from 3GPP - TS 26.403.
Regards,
Daniel
----- Original Message -----
From: "alexander lerch" <lerch@xxxxxxxxx>
To: <AUDITORY@xxxxxxxxxxxxxxx>
Sent: Wednesday, February 08, 2006 1:49 PM
Subject: Re: [AUDITORY] computational complexity of psychoacoustic models
The choice is, at least for all MPEG codecs, completely up to the
developer. You can decide not to use a psychoacoustic model at all, or
you can decide to use a complex model to gain as much quality as possible.
Oftenly used steps are:
FFT
Critical Band grouping
Conversion to dB
(Analysis of tonality of possible maskers)
calculation of masking threshold via masking model
Have a look at the psychoacoustic model 2 in the informative part of the
MPEG-1 standard.
Kind regards,
Alexander
#ARIJIT BISWAS# wrote:
> Hi List:
>
>
>
> I’m interested to know the computational complexity (number of additions
> and multiplications) of psychoacoustic models used in audio coding.
>
> Well, to be more specific, let’s say if I’m targeting to build a “fast”
> psychoacoustic model, which existing model and/or what kind of
> computational complexity should I try to beat?
>
>
>
> Any help/suggestions/references in this direction will be highly
> appreciated.
>
>
>
> Best Regards,
>
> ~Arijit
>
--
dipl. ing.
alexander lerch
zplane.development
:www.zplane.de
katzbachstr.21
d-10965 berlin
fon: +49.30.854 09 15.0
fax: +49.30.854 09 15.5