OpenMIC-2018: an Open Dataset for Multiple Instrument Recognition

Abstract

Identification of instruments in polyphonic recordings is a challenging, but fundamental problem in music information retrieval. While there has been significant progress in developing predictive models for this and related classification tasks, we as a community lack a common data-set which is large, freely available, diverse, and representative of naturally occurring recordings. This limits our ability to measure the efficacy of computational models. This article describes the construction of a new, open data-set for multi-instrument recognition. The dataset contains 20,000 examples of Creative Commons-licensed music available on the Free Music Archive. Each example is a 10-second excerpt which has been partially labeled for the presence or absence of 20 instrument classes by annotators on a crowd-sourcing platform. We describe in detail how the instrument taxonomy was constructed, how the dataset was sampled and annotated, and compare its characteristics to similar, previous data-sets. Finally, we present experimental results and baseline model performance to motivate future work

Related

May 2021 | CHI

Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits

Brianna Richardson, Jean Garcia-Gathright, Samuel F. Way, Jennifer Thom, Henriette Cramer

April 2021 | The Web Conference

Where To Next? A Dynamic Model of User Preferences

Francesco Sanna Passino, Lucas Maystre, Dmitrii Moor, Ashton Anderson, Mounia Lalmas

April 2021 | AISTATS

Collaborative Classification from Noisy Labels

Lucas Maystre, Nagarjuna Kumarappan, Judith Bütepage, Mounia Lalmas