BlogVibeset Choon: Protect Your Musical Assets

Vibeset Choon: Protect Your Musical Assets

2026-03-20
Bishal Upadhyaya
4 min read
Vibeset Choon: Protect Your Musical Assets, cover image

The music listening experience is evolving faster than we can keep up with. If you spend any time on TikTok, Instagram Reels or Youtube Shorts, you know what I mean — songs are constantly being chopped, sped up, and layered with effects. It is surely creative and fun – but now the industry has a massive royalty leakage problem. Artists aren't getting paid because audio fingerprinting systems, which are meant to track their tracks, aren't designed for the remix era.

There's been some incredible AI research lately to fix this. I've recently been obsessing over the Robust Neural Audio Fingerprinting using the Music Foundation Models paper. Their neural models are trained to recognize songs that are chopped, altered, and compressed, with added voiceovers and noises. Just like social media.

However, these models are big. Too big – the top performer is 360 MILLION parameters. As a former researcher now in the startup world, one of the hardest lessons for me was learning to think in margins, as well as metrics. To work in the real world, an audio fingerprinter must be low cost and lightweight to run continuously at high volume.

Vibeset Choon's Neural Approach

That's how Choon came to be. I built a neural fingerprinting model using the same training parameters as the above and kept light-weight at 48M parameters.

ModelSizeTimePitchT+PNoiseReverbR+NB.P.H.P.L.P.EchoEnc.Recall@1
MuQ-Unfrozen~300M96948797100906373741009688.18
MuQ-Frozen~300M909186909884607269939083.91
Vibeset Choon (Neural)48M87927586.2574.2564.256731.2573.7594.2595.577.23
MERT-Unfrozen95M10092818798783235701004474.27
MERT-Frozen95M978981869571302968963870.91
BEATs-Unfrozen~90M8489738491772739761003370.27

For its size, Vibeset Choon's neural model is pretty good. You wouldn't need a GPU Server or a datacenter to train our model on a catalog. Ours and every other neural model struggles with the high-pass filters, but that's non-issue. Shazam has had that covered since the glory days of the 2000s. Hit me baby one more time.

Classical + Neural = The Two Tier Approach

This is why Choon uses a two-tier approach. We don't need to reinvent the wheel; Shazam's classical fingerprinting remains the gold standard for speed and efficiency. It doesn't share the neural models weakness for frequency band filters – it just fails the moment it's hit by a pitched or stretched track on a Short. By layering our lightweight neural model on top of the classical foundation, we reduce the revenue leakage caused by modified audio.

ModelSizeTimePitchT+PNoiseReverbR+NB.P.H.P.L.P.EchoEnc.Recall@1
Vibeset Choon (Two-Tiered, FMA)48M84.2590.7573.0096.7597.7582.2596.7593.0097.5099.2599.2591.86
Vibeset Choon (Neural, FMA)48M87.0092.0075.0086.2574.2564.2567.0031.2573.7594.2595.5077.23
MuQ-Unfrozen~300M96.0094.0087.0097.00100.0090.0063.0073.0074.00100.0096.0088.18

Why This Matters

At the end of the day, song tracking systems must work for the economics of music. By balancing the speed of classical methods with a lightweight neural model, we can find additional revenue for artists without sacrificing margins.

If you're a distributor, a label, or just someone interested in how we protect musical assets in a creator and remix economy, talk to us. Vibeset Choon can help.