[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Feedback on features for music similarity

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] Feedback on features for music similarity
From: Paul Arzelier <paul.arzelier@xxxxxxx>
Date: Mon, 29 Jul 2024 18:01:50 +0200
Approved-by: paul.arzelier@xxxxxxx
Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=i9ZzyQ45; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=free.fr
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :content-transfer-encoding:content-language:user-agent:mime-version :approved-by:dkim-signature; bh=P+AOJuVGuPFLOkf9qB7sbvKmBa2GE5JljaWxejswsAE=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=caSr9h93Oa52wnAMtFklkS8FdMHPOtAf7BQKxXQhe6QKtrGKbsOhYzkimtDRAWUXke I/8ahagE67WzXKF5TqmImSrSfK40+Uskc+/m0Wtt6DbVefiRlHHlMHyB4lWYQgxCRl6m n4MGWr9jk8tdvOitKPJfmXyi/MiA7MaQ+f63tyxtCobSOAevHprFGapw2fZGTg9RuJyH E/pBlePSRZmcQbB+5+JMT5woAVuabLRxNnsnru6ekdESpxbdkTQlB3nvaigJyNf05jqD 9xJ5faOP2yY6gIeB4DKlg1wBgCIQ+9h1u+Tv1uXTSNNeG922OUAAQuQmBpu44CnvYsfW VTrg==; dara=google.com
Arc-seal: i=1; a=rsa-sha256; t=1722320477; cv=none; d=google.com; s=arc-20160816; b=w5yRx6SfhPnf8CKzXmW0dDLTxkmFroy6jduzW5cVtQB1q/ucwvrr3k1GI+64E9qMl8 Mcax20x/GjRXvWpWx32kLQ8ixnlIxs6pM4t+/j19xE5ystlbLXvbxe7bLbKB6/vr0OK+ mXJ5T8xAbGAvPNlhKD2u3ysc5CeHmwHb6FodbEepmkE3oJKwxY7oklpBnTMasxADqcPI bzD9jo1mU57xJR9fFuVteWLkjYOg3iK2XJTJfv3qaZiYe9UiqpKFAw8BH+jbK0E1XkOO +w6FMBJ8jxeFgMmEixzqhdVSn24sOCPHfmIN/6p+jBTzF6Sd5kSM83IXgMpPDXUKu/zh 8SmQ==
Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=i9ZzyQ45; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=free.fr
Delivered-to: dan.ellis@xxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=P+AOJuVGuPFLOkf9qB7sbvKmBa2GE5JljaWxejswsAE=; i=@LISTS.MCGILL.CA; h=Approved-By:MIME-Version:User-Agent:Content-Language:Content-Type:Content-Transfer-Encoding:Message-ID:Date:Reply-To:Sender:From:Subject:To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=i9ZzyQ45rVNYuE5YYfpHqCg+7fDoRb1YdlBGPNVNbtxD5DnlNmuTjEfnILJK426YMnEW3wTst6UtzWmePaSlDfC9DOJZ+qIY5Jr9h2Jhb8DKh3lTF1GkUWNW0+XA/CLL5WT6HasdVtmU7Inj897ieQWLPgCFMbO4gTpRpIp/e2y9MH0xyvPHym+e+ouY9ZMGKXZKqwpopMBroJKqdjZjnXje9EqEi1T+AsSQqZBdqyJI/ZHGYbwdFPr/t37yqkur7eHwV63PDto3so16wAbY1Trsnt5g9vIOmJ5+8OD3bpmLS6otr5U2gvCog96fiH08hJjFE2N5g0Pt1ch4HSF2Vg==
List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
Reply-to: Paul Arzelier <paul.arzelier@xxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Mozilla Thunderbird

Dear all,

We are working on a software to make "smart" playlists from a user'smusic library, using solely the files' audio content.

The goal is to have transitions that seem natural, where the user doesnot notice an abrupt change when the playlist goes from one track to theother. What constitutes a "good" playlist is of course very subjective,but right now we have something "good enough" that a few people usedaily in their audio players, and we're looking to make it better.The approach we use is very "matter-of-fact", as we want something thatworks for everyone, without it being perfect. We just want to make anopen-source tool that is easy to use, so people who don't use e.g.Spotify can still have "smart" playlists. The project is available herehttps://github.com/Polochon-street/bliss-rs/, with a small introductionto it here https://lelele.io/bliss.html. So far, results are encouragingenough that I can go to sleep listening to a playlist without beingawakened by heavy metal during the night!

However, we are NOT audio specialists, so we are questionning some ofour design decisions, and wondering whether we could find easy wins thatwe overlooked since we are hobbyists.That's why we thought that asking here would probably be the best courseof action! After looking at how things are done, there are a few pointswe're not sure at all about:

1. We are using 20 numerical features, one for tempo, 7 for timbre, 2for loudness, 10 for chroma features. Except tempo and chroma features,most of them are summarized over the track using mean and standarddeviation. Maybe there is a better way to summarize them?

2. The current way we normalize features is through min-maxnormalization, to have all the features between -1 and 1. We do thissince a user's library can have different tracks added / removed overtime, so normalizing tracks against one another would need to recomputethe features all the time. Again, maybe there is a better way to do this?

3. The distance we're using (by default) is the euclidean distance.However, since the chroma features makes the majority of features, we'reafraid it gives more importance to the chroma. Maybe using a weighteddistance giving each of the "classes" (tempo, timbre, loudness, chroma)equal importance would be more logical? In the long run we do want toimplement a "personal survey" that users would complete on their ownmusic library, like the survey implemented herehttps://lelele.io/thesis.pdf, so the system will "learn" the personalweights of the distance matrix for each user. But maybe there's an "easywin" to get here by doing an "easy" weighted matrix while we wait untilwe implemented this?

4. A more technical question about chroma - the chroma features use thepitch class features presented herehttps://speech.di.uoa.gr/ICMC-SMC-2014/images/VOL_2/1461.pdf. However,it seems that (and I might be very very wrong) what matters for the lastfour features (for major, minor, diminished, augmented) are thedistribution of one versus the other, not the absolute number. Combinethis with the fact that the normalization gives numbers very close toone another (on my track library, the minimum is -0.999999761581421 andthe maximum -0.893687009811401 for the major class), and it feels underused.Maybe a better normalization would be to run between the four features(major, minor, diminished) something like value = feature_value /max(major, minor, diminished, augmented)? As it doesn't seem absolutevalues matter so much for these four.

5. What would be a good evaluation mechanism we could use to make surewe don't "regress" while implementing features / changing our currentfeatures? The best thing is always human input, but in our case, it isnot really feasible to ask individual people to fill out surveys everytime we tweak a settings. We've looked into genre clustering as a way toreplace that, but maybe there are better ways? Maybe ask people to pointthe odd-one out between 3 songs, and use that as source of truths for"mini-playlists"?

We know that we won't be able to make a universal algorithm, but so farwe have something that seems good enough™️, and it would be nice to makeit even better :)

Sorry for the fairly long message, but we're hoping maybe someone willcatch something that we do *completely wrong* (again, it's hobbyistsmaking this), so we can hopefully make it better!


Best Regards,
Paul

Prev by Date: [AUDITORY] Call for Papers: ASVspoof Workshop 2024 – Interspeech 2024 Satellite Event (Deadeline: July 31, 2024)
Next by Date: [AUDITORY] Joint PhD position (Sydney, Australia and Dijon, France):
Previous by thread: [AUDITORY] Call for Papers: ASVspoof Workshop 2024 – Interspeech 2024 Satellite Event (Deadeline: July 31, 2024)
Next by thread: Re: [AUDITORY] Feedback on features for music similarity
Index(es):
- Date
- Thread