[AUDITORY] AudioSet: An ontology and human-labeled dataset for audio events (Jort Gemmeke )


Subject: [AUDITORY] AudioSet: An ontology and human-labeled dataset for audio events
From:    Jort Gemmeke  <jgemmeke@xxxxxxxx>
Date:    Tue, 7 Mar 2017 16:28:50 -0600
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--001a1142c5f4aa58fa054a2b8b02 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear colleagues, We are excited to announce AudioSet <http://g.co/audioset>, a comprehensive ontology of over 600 sound classes and a dataset of over 2 million 10-second YouTube clips annotated with sound labels. The ontology is a manually-assembled hierarchy of sound event classes, ranging from =E2=80=9CChild speech=E2=80=9D to =E2=80=9CUkulele=E2=80=9D to= =E2=80=9CBoing.=E2=80=9D It is informed by comparison with other sound research and sound event sets, and in response to what we=E2=80=99ve learned annotating the videos. It remains a work in p= rogress and we hope to see community contributions and refinements. The dataset was created by mining YouTube for videos likely to contain a target sound, followed by crowdsourced human verification. For mining we used a range of approaches ranging from title search to content-based techniques. The ontology and dataset construction are described in more detail in our ICASSP 2017 paper. <https://research.google.com/pubs/pub45857.html> The data release includes the URLs of all the excerpts along with the sound classes judged present, as well as precalculated audio features from a VGG-inspired acoustic model <https://research.google.com/pubs/pub45611.html>. You can browse the ontology, explore and download the data at g.co/audioset Jort Gemmeke On behalf of the sound and video understanding teams in the Machine Perception Research <https://research.google.com/pubs/MachinePerception.html> organization at Google. --001a1142c5f4aa58fa054a2b8b02 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><span id=3D"gmail-docs-internal-guid-fd20efc6-aae2-dfae-54= 03-25b3cbc0fb62"><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;ma= rgin-bottom:0pt"><span style=3D"font-size:11pt;font-family:arial;color:rgb(= 0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre= -wrap">Dear colleagues,</span></p><br><p dir=3D"ltr" style=3D"line-height:1= .38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:11pt;font-fa= mily:arial;color:rgb(0,0,0);background-color:transparent;vertical-align:bas= eline;white-space:pre-wrap">We are excited to announce </span><a href=3D"ht= tp://g.co/audioset" style=3D"text-decoration:none"><span style=3D"font-size= :11pt;font-family:arial;background-color:transparent;text-decoration:underl= ine;vertical-align:baseline;white-space:pre-wrap">AudioSet</span></a><span = style=3D"font-size:11pt;font-family:arial;color:rgb(0,0,0);background-color= :transparent;vertical-align:baseline;white-space:pre-wrap">, a comprehensiv= e ontology of over 600 sound classes and a dataset of over 2 million 10-sec= ond YouTube clips annotated with sound labels.</span></p><br><p dir=3D"ltr"= style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D= "font-size:11pt;font-family:arial;color:rgb(0,0,0);background-color:transpa= rent;vertical-align:baseline;white-space:pre-wrap">The ontology is a manual= ly-assembled hierarchy of sound event classes, ranging from =E2=80=9CChild = speech=E2=80=9D to =E2=80=9CUkulele=E2=80=9D to =E2=80=9CBoing.=E2=80=9D It= is informed by comparison with other sound research and sound event sets, = and in response to what we=E2=80=99ve learned annotating the videos. It rem= ains a work in progress and we hope to see community contributions and refi= nements.</span></p><br><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:= 0pt;margin-bottom:0pt"><span style=3D"font-size:11pt;font-family:arial;colo= r:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-spa= ce:pre-wrap">The dataset was created by mining YouTube for videos likely to= contain a target sound, followed by crowdsourced human verification. For m= ining we used a range of approaches ranging from title search to content-ba= sed techniques. The ontology and dataset construction are described in more= detail in our </span><a href=3D"https://research.google.com/pubs/pub45857.= html" style=3D"text-decoration:none"><span style=3D"font-size:11pt;font-fam= ily:arial;background-color:transparent;text-decoration:underline;vertical-a= lign:baseline;white-space:pre-wrap">ICASSP 2017 paper.</span></a></p><br><p= dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><s= pan style=3D"font-size:11pt;font-family:arial;color:rgb(0,0,0);background-c= olor:transparent;vertical-align:baseline;white-space:pre-wrap">The data rel= ease includes the URLs of all the excerpts along with the sound classes jud= ged present, as well as precalculated audio features from a </span><a href= =3D"https://research.google.com/pubs/pub45611.html" style=3D"text-decoratio= n:none"><span style=3D"font-size:11pt;font-family:arial;background-color:tr= ansparent;text-decoration:underline;vertical-align:baseline;white-space:pre= -wrap">VGG-inspired acoustic model</span></a><span style=3D"font-size:11pt;= font-family:arial;color:rgb(0,0,0);background-color:transparent;vertical-al= ign:baseline;white-space:pre-wrap">.</span></p><br><p dir=3D"ltr" style=3D"= line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size= :11pt;font-family:arial;color:rgb(0,0,0);background-color:transparent;verti= cal-align:baseline;white-space:pre-wrap">You can browse the ontology, explo= re and download the data at </span><a href=3D"http://g.co/audioset" style= =3D"text-decoration:none"><span style=3D"font-size:11pt;font-family:arial;b= ackground-color:transparent;text-decoration:underline;vertical-align:baseli= ne;white-space:pre-wrap">g.co/audioset</span></a></p><br><p dir=3D"ltr" sty= le=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"fon= t-size:11pt;font-family:arial;color:rgb(0,0,0);background-color:transparent= ;vertical-align:baseline;white-space:pre-wrap">Jort Gemmeke</span></p><span= style=3D"font-size:11pt;font-family:arial;color:rgb(0,0,0);background-colo= r:transparent;vertical-align:baseline;white-space:pre-wrap">On behalf of th= e sound and video understanding teams in the </span><a href=3D"https://rese= arch.google.com/pubs/MachinePerception.html" style=3D"text-decoration:none"= ><span style=3D"font-size:11pt;font-family:arial;background-color:transpare= nt;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">= Machine Perception Research</span></a><span style=3D"font-size:11pt;font-fa= mily:arial;color:rgb(0,0,0);background-color:transparent;vertical-align:bas= eline;white-space:pre-wrap"> organization at Google.</span></span><br></div= > --001a1142c5f4aa58fa054a2b8b02--


This message came from the mail archive
../postings/2017/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University