Subject: MSD: announcing the * dataset* From: Thierry Bertin-Mahieux <tb2332@xxxxxxxx> Date: Thu, 20 Oct 2011 12:12:41 -0400 List-Archive:<>The Million Song Dataset (MSD) team is proud to partner with to announce a new complementary dataset: the dataset. It contains song-level tags and song-to-song similarity. And it's big (i.e. BIG)! A few numbers: * 943,347 matched tracks MSD <-> * 505,216 tracks with at least one tag * 584,897 tracks with at least one similar track * 522,366 unique tags * 8,598,630 (track - tag) pairs * 56,506,688 (track - similar track) pairs We thank ( for making this data available, it is the largest addition to the MSD so far. We are convinced that its impact on music information retrieval will be considerable. As always, we appreciate any feedback! For instance, my favorite tag so far is "Acid Smurfs". A few additional notes on the MSD: - we are working on some additional data regarding collaborative filtering, more on this at ISMIR - we turned the CAL500 and CAL10K datasets into MSD format ( - please consider attending our tutorial at ISMIR ( Happy swimming in data! Thierry Bertin-Mahieux Million Song Dataset team