Semi-Automated Music Catalog Curation Using Audio and Metadata

Abstract

We present a system to assist Subject Matter Experts (SMEs) to curate large online music catalogs. The system detects releases that are incorrectly attributed to an artist discography (misattribution), when the discography of a single artist is incorrectly separated (duplication), and predicts suitable relocations of misattributed releases. We use historical discography corrections to train and evaluate our system’s component models. These models combine vector representations of audio with metadata-based features, which outperform models based on audio or metadata alone. We conduct three experiments with SMEs in which our system detects misattribution in artist discographies with precision greater than 77%, duplication with precision greater than 71%, and by combining the approaches, predicts a correct relocation for misattributed releases with precision up to 45%. These results demonstrate the potential of such proactive curation systems in saving valuable human time and effort by directing attention where it is most needed.

Related

November 2024 | EPJ Data Science

Consumption-based approaches in proactive detection for content moderation

Shahar Elisha, John N. Pougué-Biyong, Mariano Beguerisse-Díaz

November 2024 | SIAM Journal on Mathematics of Data Science

Topological Fingerprints for Audio Identification

Wojciech Reise, Ximena Fernández, Maria Dominguez, Heather A. Harrington, Mariano Beguerisse-Díaz

October 2024 | CIKM

PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

A. Ghazimatin, E. Garmash, G. Penha, K. Sheets, M. Achenbach, O. Semerci, R. Galvez, M. Tannenberg, S. Mantravadi, D. Narayanan, O. Kalaydzhyan, D. Cole, B. Carterette, A. Clifton, P. N. Bennett, C. Hauff, M. Lalmas-Roelleke