[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AUDITORY] Feedback on features for music similarity
- To: AUDITORY@xxxxxxxxxxxxxxx
- Subject: [AUDITORY] Feedback on features for music similarity
- From: Paul Arzelier <paul.arzelier@xxxxxxx>
- Date: Mon, 29 Jul 2024 18:01:50 +0200
- Approved-by: paul.arzelier@xxxxxxx
- Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=i9ZzyQ45; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=free.fr
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :content-transfer-encoding:content-language:user-agent:mime-version :approved-by:dkim-signature; bh=P+AOJuVGuPFLOkf9qB7sbvKmBa2GE5JljaWxejswsAE=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=caSr9h93Oa52wnAMtFklkS8FdMHPOtAf7BQKxXQhe6QKtrGKbsOhYzkimtDRAWUXke I/8ahagE67WzXKF5TqmImSrSfK40+Uskc+/m0Wtt6DbVefiRlHHlMHyB4lWYQgxCRl6m n4MGWr9jk8tdvOitKPJfmXyi/MiA7MaQ+f63tyxtCobSOAevHprFGapw2fZGTg9RuJyH E/pBlePSRZmcQbB+5+JMT5woAVuabLRxNnsnru6ekdESpxbdkTQlB3nvaigJyNf05jqD 9xJ5faOP2yY6gIeB4DKlg1wBgCIQ+9h1u+Tv1uXTSNNeG922OUAAQuQmBpu44CnvYsfW VTrg==; dara=google.com
- Arc-seal: i=1; a=rsa-sha256; t=1722320477; cv=none; d=google.com; s=arc-20160816; b=w5yRx6SfhPnf8CKzXmW0dDLTxkmFroy6jduzW5cVtQB1q/ucwvrr3k1GI+64E9qMl8 Mcax20x/GjRXvWpWx32kLQ8ixnlIxs6pM4t+/j19xE5ystlbLXvbxe7bLbKB6/vr0OK+ mXJ5T8xAbGAvPNlhKD2u3ysc5CeHmwHb6FodbEepmkE3oJKwxY7oklpBnTMasxADqcPI bzD9jo1mU57xJR9fFuVteWLkjYOg3iK2XJTJfv3qaZiYe9UiqpKFAw8BH+jbK0E1XkOO +w6FMBJ8jxeFgMmEixzqhdVSn24sOCPHfmIN/6p+jBTzF6Sd5kSM83IXgMpPDXUKu/zh 8SmQ==
- Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=i9ZzyQ45; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=free.fr
- Delivered-to: dan.ellis@xxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=P+AOJuVGuPFLOkf9qB7sbvKmBa2GE5JljaWxejswsAE=; i=@LISTS.MCGILL.CA; h=Approved-By:MIME-Version:User-Agent:Content-Language:Content-Type:Content-Transfer-Encoding:Message-ID:Date:Reply-To:Sender:From:Subject:To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=i9ZzyQ45rVNYuE5YYfpHqCg+7fDoRb1YdlBGPNVNbtxD5DnlNmuTjEfnILJK426YMnEW3wTst6UtzWmePaSlDfC9DOJZ+qIY5Jr9h2Jhb8DKh3lTF1GkUWNW0+XA/CLL5WT6HasdVtmU7Inj897ieQWLPgCFMbO4gTpRpIp/e2y9MH0xyvPHym+e+ouY9ZMGKXZKqwpopMBroJKqdjZjnXje9EqEi1T+AsSQqZBdqyJI/ZHGYbwdFPr/t37yqkur7eHwV63PDto3so16wAbY1Trsnt5g9vIOmJ5+8OD3bpmLS6otr5U2gvCog96fiH08hJjFE2N5g0Pt1ch4HSF2Vg==
- List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
- List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>
- List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
- List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
- List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
- Reply-to: Paul Arzelier <paul.arzelier@xxxxxxx>
- Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
- User-agent: Mozilla Thunderbird
Dear all,
We are working on a software to make "smart" playlists from a user's
music library, using solely the files' audio content.
The goal is to have transitions that seem natural, where the user does
not notice an abrupt change when the playlist goes from one track to the
other. What constitutes a "good" playlist is of course very subjective,
but right now we have something "good enough" that a few people use
daily in their audio players, and we're looking to make it better.
The approach we use is very "matter-of-fact", as we want something that
works for everyone, without it being perfect. We just want to make an
open-source tool that is easy to use, so people who don't use e.g.
Spotify can still have "smart" playlists. The project is available here
https://github.com/Polochon-street/bliss-rs/, with a small introduction
to it here https://lelele.io/bliss.html. So far, results are encouraging
enough that I can go to sleep listening to a playlist without being
awakened by heavy metal during the night!
However, we are NOT audio specialists, so we are questionning some of
our design decisions, and wondering whether we could find easy wins that
we overlooked since we are hobbyists.
That's why we thought that asking here would probably be the best course
of action! After looking at how things are done, there are a few points
we're not sure at all about:
1. We are using 20 numerical features, one for tempo, 7 for timbre, 2
for loudness, 10 for chroma features. Except tempo and chroma features,
most of them are summarized over the track using mean and standard
deviation. Maybe there is a better way to summarize them?
2. The current way we normalize features is through min-max
normalization, to have all the features between -1 and 1. We do this
since a user's library can have different tracks added / removed over
time, so normalizing tracks against one another would need to recompute
the features all the time. Again, maybe there is a better way to do this?
3. The distance we're using (by default) is the euclidean distance.
However, since the chroma features makes the majority of features, we're
afraid it gives more importance to the chroma. Maybe using a weighted
distance giving each of the "classes" (tempo, timbre, loudness, chroma)
equal importance would be more logical? In the long run we do want to
implement a "personal survey" that users would complete on their own
music library, like the survey implemented here
https://lelele.io/thesis.pdf, so the system will "learn" the personal
weights of the distance matrix for each user. But maybe there's an "easy
win" to get here by doing an "easy" weighted matrix while we wait until
we implemented this?
4. A more technical question about chroma - the chroma features use the
pitch class features presented here
https://speech.di.uoa.gr/ICMC-SMC-2014/images/VOL_2/1461.pdf. However,
it seems that (and I might be very very wrong) what matters for the last
four features (for major, minor, diminished, augmented) are the
distribution of one versus the other, not the absolute number. Combine
this with the fact that the normalization gives numbers very close to
one another (on my track library, the minimum is -0.999999761581421 and
the maximum -0.893687009811401 for the major class), and it feels underused.
Maybe a better normalization would be to run between the four features
(major, minor, diminished) something like value = feature_value /
max(major, minor, diminished, augmented)? As it doesn't seem absolute
values matter so much for these four.
5. What would be a good evaluation mechanism we could use to make sure
we don't "regress" while implementing features / changing our current
features? The best thing is always human input, but in our case, it is
not really feasible to ask individual people to fill out surveys every
time we tweak a settings. We've looked into genre clustering as a way to
replace that, but maybe there are better ways? Maybe ask people to point
the odd-one out between 3 songs, and use that as source of truths for
"mini-playlists"?
We know that we won't be able to make a universal algorithm, but so far
we have something that seems good enough™️, and it would be nice to make
it even better :)
Sorry for the fairly long message, but we're hoping maybe someone will
catch something that we do *completely wrong* (again, it's hobbyists
making this), so we can hopefully make it better!
Best Regards,
Paul