Hi Everyone,
I'm happy to announce the release of a new dataset,
AVA Speech, which provides speech activity labels for v1.0 of the AVA dataset:
– It contains densely annotated labels indicating when speech is present, as well as annotating the background condition: whether it was clean speech, speech with background music or speech with background noise. Multiple raters annotated every instant of each of the 15-minute clips, and the ratings were merged using a majority vote to obtain the final set of labels which have been released.
–
This work is described in more detail in our paper (available on arxiv here) which will be presented at Interspeech 2018 on September 4. In addition to the data itself, the paper provides baseline performance numbers for speech detection performance in the various conditions, using audio-only and visual-only systems.
– Please use the ava-dataset-users Google group for discussions and questions around the dataset, and please feel free to forward this note to relevant lists.
Regards,
Sourish Chaudhuri