Subject: [AUDITORY] CfP: Special session on representation learning for audio, speech, and music processing From: Konstantinos Drosos <Konstantinos Drosos> Date: Tue, 8 Dec 2020 09:50:13 +0200--Apple-Mail=_B81DD4D1-0692-4F91-A7F4-4A4C910AAA7B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Call for papers =3D=3D=3D=3D Apologies for cross posting =3D=3D=3D=3D Dear colleagues,=20 We are happy to share the call for papers to the special session=20 =E2=80=9CRepresentation Learning for Audio, Speech, and Music = Processing=E2=80=9D=20 at the International Joint Conference on Neural Networks (IJCNN) 2021.=20= All submitted papers to the special session are submitted at the = conference submission portal (as regular papers) and undergo the same = full-paper peer review process as any other paper in IJCNN 2021. Special session website: https://dr-costas.github.io/rlasmp2021-website/ = <https://dr-costas.github.io/rlasmp2021-website/> =20 Conference website: https://www.ijcnn.org <https://www.ijcnn.org/> =20 Accepted special sessions at IJCNN: = https://www.ijcnn.org/accepted-special-sessions = <https://www.ijcnn.org/accepted-special-sessions> =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Important dates:=20 Paper submission: 15th of January, 2021 Notification of acceptance: 15th of March, 2021 Camera ready submission: 30th of March, 2021 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Scope and topics:=20 In the last decade, deep learning has revolutionized the research fields = of audio and speech signal processing, acoustic scene analysis, and = music information retrieval. In these research fields, methods relying = on deep learning have achieved remarkable performance in various = applications and tasks, surpassing legacy methods that rely on the = independent usage of signal processing operations and machine learning = algorithms. The huge success of deep learning methods relies on their = ability to learn representations from sound signals that are useful for = various downstream tasks. These representations encapsulate the = underlying structure or features of the sound signals, or the latent = variables that describe the underlying statistics of the respective = signals. Despite this success, learning representations of audio with deep models = remains challenging. For example, the diversity of acoustic noise, the = multiplicity of recording devices (e.g., high-end microphones vs. = smartphones), and the source variability challenge machine learning = methods when they are used in realistic environments. In audio event = detection, which has recently become a vigorous research field, systems = for the automatic detection of multiple overlapping events are still far = from reaching human performance. Another major challenge is the design = of robust speech processing systems. Speech enhancement technologies = have significantly improved in the past years, notably thanks to deep = learning methods. However, there is still a large performance gap = between controlled environments and real-world situations. As a final = example, in the music information retrieval field, modeling the = high-level semantics based on local and long-term relations in music = signals is still a core challenge. More generally, self-supervised = approaches that can leverage a large amount of unlabeled data are very = promising for learning models that can serve as a powerful base for many = applications and tasks. Thus, it is of great interest for the scientific = community to find new methods for representing audio signals using = hierarchical models, such as deep neural networks. This will enable = novel learning methods to leverage the large amount of information that = audio, speech, and music signals convey. The aim of this session is to establish a venue where engineers, = scientists, and practitioners from both academia and industry, can = present and discuss cutting-edge results in representation learning in = audio, speech, and music signal processing. Driven by the constantly = increasing popularity of audio, speech, and music representation = learning, the organizing committee of this session is motivated to = build, in the long-term, a solid reference within the computational = intelligence community for the digital audio field. The scope of this proposed special session is representation learning, = focused on audio, speech, and music. Representation learning is one of = the main aspects of neural networks. Thus, the scope of this proposes = special session is well aligned with the scope of the IJCNN, as the = current special session is focused on a core aspect of neural networks, = which is the representation learning. The topics of the proposed special session include, but are not limited = to: =E2=80=A2 Audio, speech, and music signal generative models and methods =E2=80=A2 Single and multi-channel methods for separation, enhancement, = and denoising =E2=80=A2 Spatial analysis, modification, and synthesis for augmented = and virtual reality =E2=80=A2 Detection, localization, and tracking of audio sources/events =E2=80=A2 Style transfer, voice conversion, digital effects, and = personalization =E2=80=A2 Adversarial attacks and real/synthetic discrimination methods =E2=80=A2 Information retrieval and classification methods =E2=80=A2 Multi- and inter-modal models and methods =E2=80=A2 Self-supervised/metric learning methods =E2=80=A2 Domain adaptation, transfer learning, knowledge distilation, = and K-shot approaches=20 =E2=80=A2 Differentiable signal processing based methods =E2=80=A2 Privacy preserving methods =E2=80=A2 Interpretability and explainability in deep models for audio =E2=80=A2 Context and structure-aware approaches On behalf of the organizing committee,=20 Konstantinos Drossos, PhD Senior researcher Audio Research Group Tampere University, Finland Office: TF309 Address: Korkeakoulunkatu 10, FI-33720=20 mail: konstantinos.drossos@xxxxxxxx <mailto:konstantinos.drossos@xxxxxxxx>= --Apple-Mail=_B81DD4D1-0692-4F91-A7F4-4A4C910AAA7B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><span= style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=3D"">Call= for papers</span><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, 0, 0); color: = rgb(0, 0, 0);" class=3D"">=3D=3D=3D=3D Apologies for cross posting = =3D=3D=3D=3D</span><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, 0, 0); color: = rgb(0, 0, 0);" class=3D"">Dear colleagues, </span><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=3D"">We = are happy to share the call for papers to the special = session </span><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, 0, 0); color: = rgb(0, 0, 0);" class=3D"">=E2=80=9CRepresentation Learning for Audio, = Speech, and Music Processing=E2=80=9D </span><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class=3D"">at = the International Joint Conference on Neural Networks (IJCNN) = 2021. </span><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, 0, 0); color: = rgb(0, 0, 0);" class=3D"">All submitted papers to the special session = are submitted at the conference submission portal (as regular papers) = and undergo the same full-paper peer review process as any other paper = in IJCNN 2021.</span><br class=3D"" style=3D"caret-color: rgb(0, 0, = 0); color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: rgb(0, 0, = 0); color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, 0, 0); = color: rgb(0, 0, 0);" class=3D"">Special session website: </span><a = href=3D"https://dr-costas.github.io/rlasmp2021-website/" = class=3D"">https://dr-costas.github.io/rlasmp2021-website/</a><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" = class=3D""> </span><br class=3D"" style=3D"caret-color: = rgb(0, 0, 0); color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, = 0, 0); color: rgb(0, 0, 0);" class=3D"">Conference = website: </span><a href=3D"https://www.ijcnn.org" = class=3D"">https://www.ijcnn.org</a><span style=3D"caret-color: rgb(0, = 0, 0); color: rgb(0, 0, 0);" class=3D""> </span><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" = class=3D"">Accepted special sessions at IJCNN: </span><a = href=3D"https://www.ijcnn.org/accepted-special-sessions" = class=3D"">https://www.ijcnn.org/accepted-special-sessions</a><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" = class=3D""> </span><br class=3D"" style=3D"caret-color: = rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: = rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" style=3D"caret-color: = rgb(0, 0, 0); color: rgb(0, 0, 0);"><span style=3D"caret-color: rgb(0, = 0, 0); color: rgb(0, 0, 0);" = class=3D"">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</span><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><span = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" = class=3D"">Important dates: </span><div class=3D"" = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><br = class=3D""><b class=3D"">Paper submission:</b> 15th of January, = 2021<br class=3D"">Notification of acceptance: 15th of March, 2021<br = class=3D"">Camera ready submission: 30th of March, 2021<br class=3D""><br = class=3D"">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br class=3D""><br = class=3D"">Scope and topics: <br class=3D""><br class=3D"">In the = last decade, deep learning has revolutionized the research fields of = audio and speech signal processing, acoustic scene analysis, and music = information retrieval. In these research fields, methods relying on = deep learning have achieved remarkable performance in various = applications and tasks, surpassing legacy methods that rely on the = independent usage of signal processing operations and machine = learning algorithms. The huge success of deep learning methods relies on = their ability to learn representations from sound signals that are = useful for various downstream tasks. These representations = encapsulate the underlying structure or features of the sound signals, = or the latent variables that describe the underlying statistics of = the respective signals.<br class=3D""><br class=3D"">Despite this = success, learning representations of audio with deep models remains = challenging. For example, the diversity of acoustic noise, the = multiplicity of recording devices (e.g., high-end microphones vs. = smartphones), and the source variability challenge machine learning = methods when they are used in realistic environments. In audio event = detection, which has recently become a vigorous research field, = systems for the automatic detection of multiple overlapping events are = still far from reaching human performance. Another major challenge is = the design of robust speech processing systems. Speech enhancement = technologies have significantly improved in the past years, notably = thanks to deep learning methods. However, there is still a = large performance gap between controlled environments and = real-world situations. As a final example, in the music information = retrieval field, modeling the high-level semantics based on local = and long-term relations in music signals is still a core challenge. = More generally, self-supervised approaches that can leverage a large = amount of unlabeled data are very promising for learning models = that can serve as a powerful base for many applications and tasks. Thus, = it is of great interest for the scientific community to find new methods = for representing audio signals using hierarchical models, such as = deep neural networks. This will enable novel learning methods to = leverage the large amount of information that audio, speech, and music = signals convey.<br class=3D""><br class=3D"">The aim of this session is = to establish a venue where engineers, scientists, and practitioners from = both academia and industry, can present and discuss cutting-edge results = in representation learning in audio, speech, and music signal = processing. Driven by the constantly increasing popularity of audio, = speech, and music representation learning, the organizing committee of = this session is motivated to build, in the long-term, a solid = reference within the computational intelligence community for the = digital audio field.<br class=3D""><br class=3D"">The scope of this = proposed special session is representation learning, focused on audio, = speech, and music. Representation learning is one of the main aspects of = neural networks. Thus, the scope of this proposes special session = is well aligned with the scope of the IJCNN, as the current special = session is focused on a core aspect of neural networks, which is the = representation learning.<br class=3D""><br class=3D"">The topics of = the proposed special session include, but are not limited to:<br = class=3D""><br class=3D"">=E2=80=A2 Audio, speech, and music signal = generative models and methods<br class=3D"">=E2=80=A2 Single and = multi-channel methods for separation, enhancement, and denoising<br = class=3D"">=E2=80=A2 Spatial analysis, modification, and synthesis for = augmented and virtual reality<br class=3D"">=E2=80=A2 Detection, = localization, and tracking of audio sources/events<br class=3D"">=E2=80=A2= Style transfer, voice conversion, digital effects, and = personalization<br class=3D"">=E2=80=A2 Adversarial attacks and = real/synthetic discrimination methods<br class=3D"">=E2=80=A2 = Information retrieval and classification methods<br class=3D"">=E2=80=A2 = Multi- and inter-modal models and methods<br class=3D"">=E2=80=A2 = Self-supervised/metric learning methods<br class=3D"">=E2=80=A2 Domain = adaptation, transfer learning, knowledge distilation, and K-shot = approaches <br class=3D"">=E2=80=A2 Differentiable signal = processing based methods<br class=3D"">=E2=80=A2 Privacy preserving = methods<br class=3D"">=E2=80=A2 Interpretability and explainability in = deep models for audio<br class=3D"">=E2=80=A2 Context and = structure-aware approaches<br class=3D""><br class=3D"">On behalf of the = organizing committee, <div class=3D""><br class=3D""><div = class=3D"">Konstantinos Drossos, PhD<br class=3D"">Senior researcher<br = class=3D"">Audio Research Group<br class=3D"">Tampere University, = Finland<br class=3D""><br class=3D"">Office: TF309<br class=3D"">Address: = Korkeakoulunkatu 10, FI-33720 <br class=3D"">mail: <a = href=3D"mailto:konstantinos.drossos@xxxxxxxx" = class=3D"">konstantinos.drossos@xxxxxxxx</a></div></div></div></body></html= >= --Apple-Mail=_B81DD4D1-0692-4F91-A7F4-4A4C910AAA7B--