Vibeset Choon: Protect Your Musical Assets

The music listening experience is evolving faster than we can keep up with. If you spend any time on TikTok, Instagram Reels or Youtube Shorts, you know what I mean — songs are constantly being chopped, sped up, and layered with effects. It is surely creative and fun – but now the industry has a massive royalty leakage problem. Artists aren't getting paid because audio fingerprinting systems, which are meant to track their tracks, aren't designed for the remix era.

There's been some incredible AI research lately to fix this. I've recently been obsessing over the Robust Neural Audio Fingerprinting using the Music Foundation Models paper. Their neural models are trained to recognize songs that are chopped, altered, and compressed, with added voiceovers and noises. Just like social media.

However, these models are big. Too big – the top performer is 360 MILLION parameters. As a former researcher now in the startup world, one of the hardest lessons for me was learning to think in margins, as well as metrics. To work in the real world, an audio fingerprinter must be low cost and lightweight to run continuously at high volume.

Vibeset Choon's Neural Approach

That's how Choon came to be. I built a neural fingerprinting model using the same training parameters as the above and kept light-weight at 48M parameters.

Model	Size	Time	Pitch	T+P	Noise	Reverb	R+N	B.P.	H.P.	L.P.	Echo	Enc.	Recall@1
MuQ-Unfrozen	~300M	96	94	87	97	100	90	63	73	74	100	96	88.18
MuQ-Frozen	~300M	90	91	86	90	98	84	60	72	69	93	90	83.91
Vibeset Choon (Neural)	48M	87	92	75	86.25	74.25	64.25	67	31.25	73.75	94.25	95.5	77.23
MERT-Unfrozen	95M	100	92	81	87	98	78	32	35	70	100	44	74.27
MERT-Frozen	95M	97	89	81	86	95	71	30	29	68	96	38	70.91
BEATs-Unfrozen	~90M	84	89	73	84	91	77	27	39	76	100	33	70.27

For its size, Vibeset Choon's neural model is pretty good. You wouldn't need a GPU Server or a datacenter to train our model on a catalog. Ours and every other neural model struggles with the high-pass filters, but that's non-issue. Shazam has had that covered since the glory days of the 2000s. Hit me baby one more time.

Classical + Neural = The Two Tier Approach

This is why Choon uses a two-tier approach. We don't need to reinvent the wheel; Shazam's classical fingerprinting remains the gold standard for speed and efficiency. It doesn't share the neural models weakness for frequency band filters – it just fails the moment it's hit by a pitched or stretched track on a Short. By layering our lightweight neural model on top of the classical foundation, we reduce the revenue leakage caused by modified audio.

Model	Size	Time	Pitch	T+P	Noise	Reverb	R+N	B.P.	H.P.	L.P.	Echo	Enc.	Recall@1
Vibeset Choon (Two-Tiered, FMA)	48M	84.25	90.75	73.00	96.75	97.75	82.25	96.75	93.00	97.50	99.25	99.25	91.86
Vibeset Choon (Neural, FMA)	48M	87.00	92.00	75.00	86.25	74.25	64.25	67.00	31.25	73.75	94.25	95.50	77.23
MuQ-Unfrozen	~300M	96.00	94.00	87.00	97.00	100.00	90.00	63.00	73.00	74.00	100.00	96.00	88.18

Why This Matters

At the end of the day, song tracking systems must work for the economics of music. By balancing the speed of classical methods with a lightweight neural model, we can find additional revenue for artists without sacrificing margins.

If you're a distributor, a label, or just someone interested in how we protect musical assets in a creator and remix economy, talk to us. Vibeset Choon can help.