[AUDITORY] AVA Speech dataset now available (Sourish Chaudhuri )


Subject: [AUDITORY] AVA Speech dataset now available
From:    Sourish Chaudhuri  <0000007fde242bbe-dmarc-request@xxxxxxxx>
Date:    Fri, 24 Aug 2018 10:25:00 -0700
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--000000000000fc2b41057431a815 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Everyone, I'm happy to announce the release of a new dataset, AV= A Speech <http://research.google.com/ava/>, which provides speech activity labels for v1.0 of the AVA dataset: =E2=80=93 It contains densely annotated labels indicating when speech is pr= esent, as well as annotating the background condition: whether it was clean speech, speech with background music or speech with background noise. Multiple raters annotated every instant of each of the 15-minute clips, and the ratings were merged using a majority vote to obtain the final set of labels which have been released. =E2=80=93 The dataset is available on the AVA Download page <https://research.google.com/ava/download.html#ava_speech_download>. =E2=80=93 This work is described in more detail in our paper (available on = arxiv here <https://arxiv.org/pdf/1808.00606.pdf>) which will be presented at Interspeech 2018 on September 4. In addition to the data itself, the paper provides baseline performance numbers for speech detection performance in the various conditions, using audio-only and visual-only systems. =E2=80=93 Please use the ava-dataset-users Google group <https://groups.google.com/forum/#!forum/ava-dataset-users> for discussions and questions around the dataset, and please feel free to forward this note to relevant lists. Regards, Sourish Chaudhuri Google AI Perception <https://ai.google/research/teams/perception/> --000000000000fc2b41057431a815 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hi Everyone,<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I&#39;m happy to announce the release of= a new dataset, <a href=3D"http://research.google.com/ava/">AVA Speech</a>,= which provides speech activity labels for v1.0 of the AVA dataset:</div><d= iv><br></div><div>=E2=80=93=C2=A0<span style=3D"font-family:Arial,Helvetica= ,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);text-decorat= ion-style:initial;text-decoration-color:initial">It contains densely annota= ted labels indicating when speech is present, as well as annotating the bac= kground condition:<span>=C2=A0</span></span><span style=3D"font-family:Aria= l,Helvetica,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);t= ext-decoration-style:initial;text-decoration-color:initial">whether it was = clean speech, speech with background music or speech with background noise.= Multiple raters annotated</span><span style=3D"font-family:Arial,Helvetica= ,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);text-decorat= ion-style:initial;text-decoration-color:initial"><span>=C2=A0</span>every i= nstant of each of the 15-minute clips, and the ratings were merged using a = majority vote to obtain the final set of labels which have been released.</= span></div><div><span style=3D"font-family:Arial,Helvetica,sans-serif;font-= size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial= ;text-decoration-color:initial"><br></span></div><div><span style=3D"font-f= amily:Arial,Helvetica,sans-serif;font-size:12.8px;background-color:rgb(255,= 255,255);text-decoration-style:initial;text-decoration-color:initial">=E2= =80=93 The dataset is available on the <a href=3D"https://research.google.c= om/ava/download.html#ava_speech_download">AVA Download page</a>.</span></di= v><div><br></div><div>=E2=80=93=C2=A0<span style=3D"font-family:Arial,Helve= tica,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);text-dec= oration-style:initial;text-decoration-color:initial">This work is described= in more detail in our paper (<a href=3D"https://arxiv.org/pdf/1808.00606.p= df" class=3D"m_-7840899325038136639gmail-m_-4165371782552342430gmail-m_-783= 3292970471741409gmail-m_-3857697655147101160m_6266177039406254887m_76401486= 88939450201m_-2223349646687702826m_-932777962083186013cremed" style=3D"colo= r:rgb(17,85,204)" target=3D"_blank">available on arxiv here</a>) which will= be presented at Interspeech</span><span style=3D"font-family:Arial,Helveti= ca,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);text-decor= ation-style:initial;text-decoration-color:initial"><span>=C2=A0</span>2018 = on<span>=C2=A0</span><span class=3D"m_-7840899325038136639gmail-m_-41653717= 82552342430gmail-m_-7833292970471741409gmail-aBn" style=3D"border-bottom:1p= x dashed rgb(204,204,204)"><span class=3D"m_-7840899325038136639gmail-m_-41= 65371782552342430gmail-m_-7833292970471741409gmail-aQJ">September 4</span><= /span>. In addition to the data itself, the paper provides baseline perform= ance numbers for speech detection performance in the various conditions, us= ing audio-only and visual-only systems.</span></div><div><span style=3D"fon= t-family:Arial,Helvetica,sans-serif;font-size:12.8px;background-color:rgb(2= 55,255,255);text-decoration-style:initial;text-decoration-color:initial"><b= r></span></div><div><span style=3D"font-family:Arial,Helvetica,sans-serif;f= ont-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:ini= tial;text-decoration-color:initial">=E2=80=93 Please use the=C2=A0<a href= =3D"https://groups.google.com/forum/#!forum/ava-dataset-users" target=3D"_b= lank" style=3D"color:rgb(17,85,204)">ava-dataset-users Google group</a> for= discussions and questions around the dataset, and please feel free to forw= ard this note to relevant lists.</span></div><div><span style=3D"font-famil= y:Arial,Helvetica,sans-serif;font-size:12.8px;background-color:rgb(255,255,= 255);text-decoration-style:initial;text-decoration-color:initial"><br></spa= n></div><div><span style=3D"font-family:Arial,Helvetica,sans-serif;font-siz= e:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;te= xt-decoration-color:initial">Regards,</span></div><div><span style=3D"font-= family:Arial,Helvetica,sans-serif;font-size:12.8px;background-color:rgb(255= ,255,255);text-decoration-style:initial;text-decoration-color:initial">=C2= =A0Sourish Chaudhuri</span></div><div><span style=3D"font-family:Arial,Helv= etica,sans-serif;font-size:12.8px;background-color:rgb(255,255,255);text-de= coration-style:initial;text-decoration-color:initial"><a href=3D"https://ai= .google/research/teams/perception/">Google AI Perception</a></span></div><d= iv><span style=3D"font-family:Arial,Helvetica,sans-serif;font-size:12.8px;b= ackground-color:rgb(255,255,255);text-decoration-style:initial;text-decorat= ion-color:initial">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0</span></div><div><div class=3D"m_-7840899325038136639g= mail-m_-4165371782552342430gmail-m_-7833292970471741409gmail-adL" style=3D"= font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:in= itial;text-decoration-color:initial"><span class=3D"m_-7840899325038136639g= mail-m_-4165371782552342430gmail-m_-7833292970471741409gmail-im" style=3D"c= olor:rgb(80,0,80)"></span></div><br></div></div> --000000000000fc2b41057431a815--


This message came from the mail archive
src/postings/2018/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University