The Music Streaming Sessions Dataset

Abstract

At the core of many important machine learning problems faced by online streaming services is a need to model how users interact with the content. These problems can often be reduced to a combination of 1) sequentially recommending items to the user, and 2) exploiting the user’s interactions with the items as feedback for the machine learning model. Unfortunately, there are no public datasets currently available that enable researchers to explore this topic. In order to spur that research, we release the Music Streaming Sessions Dataset (MSSD), which consists of approximately 150 million listening sessions and associated user actions. Furthermore, we provide audio features and metadata for the approximately 3.7 million unique tracks referred to in the logs. This is the largest collection of such track metadata currently available to the public. This dataset enables research on important problems including how to model user listening and interaction behaviour in streaming, as well as Music Information Retrieval (MIR), and session-based sequential recommendations.

Related

August 2021 | KDD

Neural Instant Search for Music and Podcast

Helia Hashemi, Aasish Pappu, Mi Tian, Praveen Ravichandran, Mounia Lalmas, Ben Carterette

July 2021 | SIGIR

Podcast Metadata and Content: Episode Relevance and Attractiveness in Ad Hoc Search

Ben Carterette, Rosie Jones, Gareth Jones, Maria Eskevich, Sravana Reddy, Ann Clifton, Yongze Yu, Jussi Karlgren and Ian Soboroff

July 2021 | SIGIR

Current Challenges and Future Directions in Podcast Information Access

Rosie Jones, Hamed Zamani, Markus Schedl, Ching-Wei Chen, Sravana Reddy, Ann Clifton, Jussi Karlgren, Helia Hashemi, Aasish Pappu, Zahra Nazari, LongQi Yang, Oguz Semerci, Hugues Bouchard, Ben Carterette