[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Feedback on features for music similarity



Dear all,

We are working on a software to make "smart" playlists from a user's music library, using solely the files' audio content.

The goal is to have transitions that seem natural, where the user does not notice an abrupt change when the playlist goes from one track to the other. What constitutes a "good" playlist is of course very subjective, but right now we have something "good enough" that a few people use daily in their audio players, and we're looking to make it better. The approach we use is very "matter-of-fact", as we want something that works for everyone, without it being perfect. We just want to make an open-source tool that is easy to use, so people who don't use e.g. Spotify can still have "smart" playlists. The project is available here https://github.com/Polochon-street/bliss-rs/, with a small introduction to it here https://lelele.io/bliss.html. So far, results are encouraging enough that I can go to sleep listening to a playlist without being awakened by heavy metal during the night!

However, we are NOT audio specialists, so we are questionning some of our design decisions, and wondering whether we could find easy wins that we overlooked since we are hobbyists. That's why we thought that asking here would probably be the best course of action! After looking at how things are done, there are a few points we're not sure at all about:

1. We are using 20 numerical features, one for tempo, 7 for timbre, 2 for loudness, 10 for chroma features. Except tempo and chroma features, most of them are summarized over the track using mean and standard deviation. Maybe there is a better way to summarize them?

2. The current way we normalize features is through min-max normalization, to have all the features between -1 and 1. We do this since a user's library can have different tracks added / removed over time, so normalizing tracks against one another would need to recompute the features all the time. Again, maybe there is a better way to do this?

3. The distance we're using (by default) is the euclidean distance. However, since the chroma features makes the majority of features, we're afraid it gives more importance to the chroma. Maybe using a weighted distance giving each of the "classes" (tempo, timbre, loudness, chroma) equal importance would be more logical? In the long run we do want to implement a "personal survey" that users would complete on their own music library, like the survey implemented here https://lelele.io/thesis.pdf, so the system will "learn" the personal weights of the distance matrix for each user. But maybe there's an "easy win" to get here by doing an "easy" weighted matrix while we wait until we implemented this?

4. A more technical question about chroma - the chroma features use the pitch class features presented here https://speech.di.uoa.gr/ICMC-SMC-2014/images/VOL_2/1461.pdf. However, it seems that (and I might be very very wrong) what matters for the last four features (for major, minor, diminished, augmented) are the distribution of one versus the other, not the absolute number. Combine this with the fact that the normalization gives numbers very close to one another (on my track library, the minimum is -0.999999761581421 and the maximum -0.893687009811401 for the major class), and it feels underused. Maybe a better normalization would be to run between the four features (major, minor, diminished) something like value = feature_value / max(major, minor, diminished, augmented)? As it doesn't seem absolute values matter so much for these four.

5. What would be a good evaluation mechanism we could use to make sure we don't "regress" while implementing features / changing our current features? The best thing is always human input, but in our case, it is not really feasible to ask individual people to fill out surveys every time we tweak a settings. We've looked into genre clustering as a way to replace that, but maybe there are better ways? Maybe ask people to point the odd-one out between 3 songs, and use that as source of truths for "mini-playlists"?

We know that we won't be able to make a universal algorithm, but so far we have something that seems good enough™️, and it would be nice to make it even better :)

Sorry for the fairly long message, but we're hoping maybe someone will catch something that we do *completely wrong* (again, it's hobbyists making this), so we can hopefully make it better!

Best Regards,
Paul