This is an updated posting: the deadline is approaching.
Understanding the learning dynamics of Neural Audio models using Linear Algebra
Since around 2016 most research in Digital Music and Digital Audio has adopted Deep Learning techniques. These have brought important advances in performance in applications
like Music Source Separation, Automatic Music Transcription, Timbre Transfer and so on. This is good, but on the downside, the models get larger, they consume increasingly large amounts of power for training and inference, require more data and become less
understandable and explainable. These are the issues that underpin the research in this PhD.
A fundamental building block in Deep Learning models is Matrix (or Linear) Algebra. Through training, the matrix that represents each layer is progressively modified to reduce
the error between a predicted value and the training data. By examining what happens to these matrices
during training, it is possible to engineer them to learn faster and more efficiently, as well as to build DL models that are more compact. Here we turn to Low Rank matrices: we wish to explore what happens when Low Rank is imposed as a training constraint
in Neural Audio models. Is the model better trained, or not? Is the model easier and cheaper to train, or not? Early results in non-audio/music applications show that they
are better trained and they are cheaper to train. This work needs developing further for this PhD.
Research will start with Music Source Separation, exploring the learning dynamics of established models like DeMucs. It will then use the knowledge of these dynamics to intelligently
prune the models using the Low Rank approach above [1]. This will speed up the learning and inference and improve performance. Next, the work could shift to look at other Neural Audio models and applications or could become more immersed in field of Mechanistic
Interpretability, [2], to reveal the hidden, innermost structures that emerge within trained Neural Networks. Other lines of enquiry could include the trade-off between data set size (for training) vs the Ideal Rank of the various layers in the model. Again,
early results surprisingly suggest that Low Rank layers can be trained with less data!
Candidates will have excellent background in Linear Algebra (eg Eigenvectors, Singular Value Decomposition, Tensor Analysis) as well as strong interest in some aspect of music
or audio. They will also need background in Deep Learning and a sound knowledge of appropriate programming tools. Knowledge of Mathematica and the Wolfram Language would be a bonus. You will need a strong undergraduate degree and preferably a Masters degree
to a high level.
Please note that a studentship is only available for those qualifying for China Scholarship Council awards or those qualifying for our faculty’s
S&E Doctoral Research Studentships for Underrepresented Groups . Self-funded candidates are also welcome.
Full application guidelines can be found here:
https://www.c4dm.eecs.qmul.ac.uk/news/2024-11-12.PhD-call-2025/
For further details of this research topic, contact Mark Sandler
(mark.sandlerl@xxxxxxxxxx)
by email.
[1] B. Bermeitinger, T. Hrycej, and S. Handschuh, ‘Singular Value Decomposition and Neural Networks’, Jun. 2019. doi:
10.1007/978-3-030-30484-3_13.
[2] N. Cammarata et al., ‘Thread: Circuits’,
Distill, vol. 5, no. 3, p. e24, Mar. 2020, doi: 10.23915/distill.00024.
[3] V. S. Paul and P. A. Nelson, ‘Matrix analysis for fast learning of neural networks with application to the classification of acoustic spectra’,
The Journal of the Acoustical Society of America, vol. 149, no. 6, pp. 4119–4133, Jun. 2021, doi:
10.1121/10.0005126.
--
Please note I work part time Monday - Thursday so there may be a delay to my email response.
professor mark sandler, FREng, CEng, FIEEE, FAES, FIET
director of the centre for digital music (c4dm)
school of electronic engineering and computer science, queen mary university of london
mark.sandler@xxxxxxxxxx | +44
(0)20 7882 7680