OpenMIC-2018: an Open Dataset for Multiple Instrument Recognition

Abstract

Identification of instruments in polyphonic recordings is a challenging, but fundamental problem in music information retrieval. While there has been significant progress in developing predictive models for this and related classification tasks, we as a community lack a common data-set which is large, freely available, diverse, and representative of naturally occurring recordings. This limits our ability to measure the efficacy of computational models. This article describes the construction of a new, open data-set for multi-instrument recognition. The dataset contains 20,000 examples of Creative Commons-licensed music available on the Free Music Archive. Each example is a 10-second excerpt which has been partially labeled for the presence or absence of 20 instrument classes by annotators on a crowd-sourcing platform. We describe in detail how the instrument taxonomy was constructed, how the dataset was sampled and annotated, and compare its characteristics to similar, previous data-sets. Finally, we present experimental results and baseline model performance to motivate future work

Related

November 2022 | NeurIPS

Society of Agents: Regrets Bounds of Concurrent Thompson Sampling

Yan Chen, Perry Dong, Qinxun Bai, Maria Dimakopoulou, Wei Xu, Zhengyuan Zhou

November 2022 | NeurIPS

Temporally-Consistent Survival Analysis

Lucas Maystre, Daniel Russo

November 2022 | NeurIPS

Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders

Olivier Jeunen, Ciarán M. Gilligan-Lee, Rishabh Mehrotra, Mounia Lalmas