On 25/06/2018 17:00, Samer Hijazi
wrote:
Thanks Laszlo and Phil,
I am not speaking about doing ASR in two steps, i am
speaking about doing the ASR and speech enhancement jointly in
multi-objective learning process.
Are, you mean multitask learning. That didn't come over at all in
your first mail.
There are many papers showing if you used related objective
resumes to train your network, you will get better results on
both objectives than what you would get if you train for each
one separately.
An early paper on this, probably the first application to ASR, was
Parveen & Green, Multitask Learning in Connectionist
Robust ASR using Recurrent Neural Networks, Eurospeech 2003.
And it seams obvious that if we used speech contents (i.e.
text) and perfect speech waveform as two independent but
correlated targets, we will end up with a better text
recognition and better speech enhancement; am i missing
something?
It would be wrong to start with clean speech, add noise, use that as
input and clean speech + text as training targets, because in real
life speech & other sound sources don't combine like that.
That's why the spectacular results in the Parveen/Green paper are
misleading..
HTH
--
*** note email is now p.green@xxxxxxxxxx ***
Professor Phil Green
SPandH
Dept of Computer Science
University of Sheffield
*** note email is now p.green@xxxxxxxxxx ***