Few-shot musical source separation

Abstract

Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning encoder jointly with the U-Net to encode the audio examples into a conditioning vector to configure the U-Net via feature-wise linear modulation (FiLM). We evaluate the trained models on real musical recordings in the MUSDB18 and MedleyDB datasets. We show that our proposed few-shot conditioning paradigm outperforms the baseline one-hot instrument-class conditioned model for both seen and unseen instruments. We further experiment with different conditioning example characteristics, including examples from different recordings, multi-sourced examples, and negative conditioning examples, to show the potential of applying the proposed few-shot approach to a wider variety of real-world scenarios.

Related

September 2023 | RecSys

Accelerating Creator Audience Building through Centralized Exploration

Buket Baran, Guilherme Dinis Junior, Antonina Danylenko, Olayinka S. Folorunso, Gösta Forsum, Maksym Lefarov, Lucas Maystre, Yu Zhao

August 2023 | Interspeech

Lightweight and Efficient Spoken Language Identification of Long-form Audio

Winstead Zhu, Md Iftekhar Tanveer, Yang Janet Liu, Seye Ojumu, Rosie Jones

July 2023 | KDD

Impatient Bandits: Optimizing for the Long-Term Without Delay

Thomas McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek