[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Feedback on features for music similarity



Dear all,

We are working on a software to make "smart" playlists from a user's music library, using solely the files' audio content.
The goal is to have transitions that seem natural, where the user does 
not notice an abrupt change when the playlist goes from one track to the 
other. What constitutes a "good" playlist is of course very subjective, 
but right now we have something "good enough" that a few people use 
daily in their audio players, and we're looking to make it better.
The approach we use is very "matter-of-fact", as we want something that 
works for everyone, without it being perfect. We just want to make an 
open-source tool that is easy to use, so people who don't use e.g. 
Spotify can still have "smart" playlists. The project is available here 
https://github.com/Polochon-street/bliss-rs/, with a small introduction 
to it here https://lelele.io/bliss.html. So far, results are encouraging 
enough that I can go to sleep listening to a playlist without being 
awakened by heavy metal during the night!
However, we are NOT audio specialists, so we are questionning some of 
our design decisions, and wondering whether we could find easy wins that 
we overlooked since we are hobbyists.
That's why we thought that asking here would probably be the best course 
of action! After looking at how things are done, there are a few points 
we're not sure at all about:
1. We are using 20 numerical features, one for tempo, 7 for timbre, 2 
for loudness, 10 for chroma features. Except tempo and chroma features, 
most of them are summarized over the track using mean and standard 
deviation. Maybe there is a better way to summarize them?
2. The current way we normalize features is through min-max 
normalization, to have all the features between -1 and 1. We do this 
since a user's library can have different tracks added / removed over 
time, so normalizing tracks against one another would need to recompute 
the features all the time. Again, maybe there is a better way to do this?
3. The distance we're using (by default) is the euclidean distance. 
However, since the chroma features makes the majority of features, we're 
afraid it gives more importance to the chroma. Maybe using a weighted 
distance giving each of the "classes" (tempo, timbre, loudness, chroma) 
equal importance would be more logical? In the long run we do want to 
implement a "personal survey" that users would complete on their own 
music library, like the survey implemented here 
https://lelele.io/thesis.pdf, so the system will "learn" the personal 
weights of the distance matrix for each user. But maybe there's an "easy 
win" to get here by doing an "easy" weighted matrix while we wait until 
we implemented this?
4. A more technical question about chroma - the chroma features use the 
pitch class features presented here 
https://speech.di.uoa.gr/ICMC-SMC-2014/images/VOL_2/1461.pdf. However, 
it seems that (and I might be very very wrong) what matters for the last 
four features (for major, minor, diminished, augmented) are the 
distribution of one versus the other, not the absolute number. Combine 
this with the fact that the normalization gives numbers very close to 
one another (on my track library, the minimum is -0.999999761581421 and 
the maximum -0.893687009811401 for the major class), and it feels underused.
Maybe a better normalization would be to run between the four features 
(major, minor, diminished) something like value = feature_value / 
max(major, minor, diminished, augmented)? As it doesn't seem absolute 
values matter so much for these four.
5. What would be a good evaluation mechanism we could use to make sure 
we don't "regress" while implementing features / changing our current 
features? The best thing is always human input, but in our case, it is 
not really feasible to ask individual people to fill out surveys every 
time we tweak a settings. We've looked into genre clustering as a way to 
replace that, but maybe there are better ways? Maybe ask people to point 
the odd-one out between 3 songs, and use that as source of truths for 
"mini-playlists"?
We know that we won't be able to make a universal algorithm, but so far 
we have something that seems good enough™️, and it would be nice to make 
it even better :)
Sorry for the fairly long message, but we're hoping maybe someone will 
catch something that we do *completely wrong* (again, it's hobbyists 
making this), so we can hopefully make it better!
Best Regards,
Paul