Hi Everyone,
I'm happy to announce the release of a new
dataset,
AVA Active Speaker,
addressing the problem of identifying which, if any, of the visible faces in a video are speaking at any point in time. Labels are provided over continuous 15 minute segments of movies from v1.0 of the AVA dataset.
The dataset creation process and our initial audiovisual models for this task are described in this arxiv paper. The dataset is available on the AVA Download page, along with details on the dataset format
.
Please use the ava-dataset-users Google group for discussions and questions around the dataset, and please feel free to forward this note to relevant lists.
Regards,
Sourish Chaudhuri & the AVA team