[AUDITORY] Synthesis Challenge at DCASE2023 (Laurie Heller )


Subject: [AUDITORY] Synthesis Challenge at DCASE2023
From:    Laurie Heller  <hellerl@xxxxxxxx>
Date:    Fri, 3 Mar 2023 09:10:21 -0500

--Apple-Mail=_3A7F4F47-891D-43F3-8E53-0359297FAA94 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Dear Auditory list: Announcing a Foley synthesis challenge! It's a new part of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (Task 7 of DCASE). Call for entries is open, with a deadline of 15 May 2023. https://dcase.community/challenge2023/task-foley-sound-synthesis This task aims to build a Foley sound synthesis system that can generate plausible audio signals fitting into given categories of sound. Foley sound, in general, refers to sound effects that are created to convey (and sometimes enhance) the sounds produced by events occurring in a narrative (e.g. radio or film). Foley sounds are commonly added to multimedia to enhance the perceptual audio experience. This sound synthesis challenge requires the generation of original audio clips that represent a category of sound, such as footsteps. The new sounds should fit into the category that is typified by the set of sounds in the development set, yet they should not duplicate any of the provided sounds. Any synthesis approach is permitted (not just machine learning). Why is this an important goal? First, time-consuming post-production is inevitable to obtain a perfectly matched sound effect. By generating sound that belongs to a target sound category, Foley sound synthesis can make the workflow much more time and cost-effective. With the rise of virtual environments such as the metaverse, we expect a growing need for the automated generation of more and more complex and creative sound environments. Second, it can be utilized for dataset synthesis or augmentation for a wide variety of DCASE tasks including sound event detection (SED). SED has drawn great attention and synthesized datasets have been used already, e.g., URBAN-SED dataset. A high-quality Foley sound synthesis model could lead to development of better SED models. There are 7 categories of sound events to be synthesized. The challenge has two subproblems: the development of models with and without external resources. Participants are expected to submit a system for one of the two problems, and each problem is evaluated independently. Submissions will be evaluated by Frechet Audio Distance (FAD), followed by a subjective test. #foleysynthesischallenge Foley Challenge Organizers: Keunwoo Choi, Gaudio Lab, Inc.; Korea Jaekwon Im, Gaudio Lab, Inc., KAIST; Korea Laurie M. Heller, Carnegie Mellon University; USA Keisuke Imoto, Doshisha University; Japan Mathieu Lagrange, CNRS, Ecole Centrale Nantes, Nantes University; France Brian McFee, New York University; USA Yuki Okamoto, Ritsumeikan University; Japan Shinnosuke Takamichi, The University of Tokyo; Japan --Apple-Mail=_3A7F4F47-891D-43F3-8E53-0359297FAA94 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"content-type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"overflow-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;"><div = dir=3D"auto" style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: = space; line-break: after-white-space;"><div dir=3D"auto" = style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: space; = line-break: after-white-space;"><meta http-equiv=3D"content-type" = content=3D"text/html; charset=3Dus-ascii"><div style=3D"overflow-wrap: = break-word; -webkit-nbsp-mode: space; line-break: = after-white-space;"><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85); min-height: 13px;">Dear Auditory list:</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, 0.85); = min-height: 13px;"><br></div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">Announcing a Foley synthesis = challenge!&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);"><br></div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">It's a new part of the IEEE AASP Challenge on Detection = and Classification&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">of Acoustic Scenes and Events (Task 7 of = DCASE).&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">Call for entries is open, with a deadline of 15 May = 2023.&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85); min-height: 13px;"><br></div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, = 0.85);">https://dcase.community/challenge2023/task-foley-sound-synthesis</= div><div style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, 0.85); = min-height: 13px;"><br></div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85); min-height: 13px;"><br></div><div style=3D"margin: = 0px; font-stretch: normal; font-size: 11px; line-height: normal; = font-family: Menlo; color: rgba(0, 0, 0, 0.85);">This task aims to build = a Foley sound synthesis system that can generate&nbsp;</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">plausible audio signals fitting into given categories of sound. = Foley&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">sound, in general, refers to sound effects that are = created to convey&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">(and sometimes enhance) the sounds produced by = events occurring in a&nbsp;</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">narrative (e.g. radio or film). = Foley sounds are commonly added to&nbsp;</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">multimedia to enhance the perceptual = audio experience. This sound&nbsp;</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">synthesis challenge requires the = generation of original audio clips that&nbsp;</div><div style=3D"margin: = 0px; font-stretch: normal; font-size: 11px; line-height: normal; = font-family: Menlo; color: rgba(0, 0, 0, 0.85);">represent a category of = sound, such as footsteps. The new sounds should&nbsp;</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">fit into the category that is typified by the set of sounds in = the&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">development set, yet they should not duplicate any of the = provided&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">sounds. Any synthesis approach is permitted (not just = machine learning).</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85); min-height: 13px;"><br></div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">Why is this an important goal? = First, time-consuming post-production is&nbsp;</div><div style=3D"margin: = 0px; font-stretch: normal; font-size: 11px; line-height: normal; = font-family: Menlo; color: rgba(0, 0, 0, 0.85);">inevitable to obtain a = perfectly matched sound effect. By generating&nbsp;</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">sound that belongs to a target sound category, Foley sound = synthesis can&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">make the workflow much more time and = cost-effective. With the rise of&nbsp;</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">virtual environments such as the = metaverse, we expect a growing need for&nbsp;</div><div style=3D"margin: = 0px; font-stretch: normal; font-size: 11px; line-height: normal; = font-family: Menlo; color: rgba(0, 0, 0, 0.85);">the automated = generation of more and more complex and creative sound&nbsp;</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">environments. Second, it can be utilized for dataset synthesis = or&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">augmentation for a wide variety of DCASE tasks including = sound event&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">detection (SED). SED has drawn great attention and = synthesized datasets&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">have been used already, e.g., URBAN-SED dataset. A = high-quality Foley&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">sound synthesis model could lead to development of = better SED models.</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85); min-height: 13px;"><br></div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">There are 7 categories of sound = events to be synthesized. The challenge&nbsp;</div><div style=3D"margin: = 0px; font-stretch: normal; font-size: 11px; line-height: normal; = font-family: Menlo; color: rgba(0, 0, 0, 0.85);">has two subproblems: = the development of models with and without&nbsp;</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">external resources. Participants are expected to submit a system = for one&nbsp;</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">of the two problems, and each problem is evaluated = independently.&nbsp;</div><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo; color: = rgba(0, 0, 0, 0.85);">Submissions will be evaluated by Frechet Audio = Distance (FAD), followed&nbsp;</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">by a subjective test.</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">#foleysynthesischallenge</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85); min-height: 13px;"><br></div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Foley Challenge Organizers:</div><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo; color: rgba(0, 0, 0, 0.85);">Keunwoo Choi, Gaudio Lab, Inc.; = Korea</div><div style=3D"margin: 0px; font-stretch: normal; font-size: = 11px; line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Jaekwon Im, Gaudio Lab, Inc., KAIST; Korea</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Laurie M. Heller, Carnegie Mellon University; USA</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Keisuke Imoto, Doshisha University; Japan</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Mathieu Lagrange, CNRS, Ecole Centrale Nantes, Nantes = University; France</div><div style=3D"margin: 0px; font-stretch: normal; = font-size: 11px; line-height: normal; font-family: Menlo; color: rgba(0, = 0, 0, 0.85);">Brian McFee, New York University; USA</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Yuki Okamoto, Ritsumeikan University; Japan</div><div = style=3D"margin: 0px; font-stretch: normal; font-size: 11px; = line-height: normal; font-family: Menlo; color: rgba(0, 0, 0, = 0.85);">Shinnosuke Takamichi, The University of Tokyo; = Japan</div></div></div></div></body></html>= --Apple-Mail=_3A7F4F47-891D-43F3-8E53-0359297FAA94--


This message came from the mail archive
src/postings/2023/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University