Contextual and Sequential User Embeddings for Large-Scale Music Recommendation

Abstract

Sequential modelling entails making sense of sequential data, which naturally occurs in a wide array of domains. One example is systems that interact with users, log user actions and behaviour, and make recommendations of items of potential interest to users on the basis of their previous interactions. In such cases, the sequential order of user interactions is often indicative of what the user is interested in next. Similarly, for systems that automatically infer the semantics of text, capturing the sequential order of words in a sentence is essential, as even a slight re-ordering could significantly alter its original meaning. This thesis makes methodological contributions and new investigations of sequential modelling for the specific application areas of systems that recommend music tracks to listeners and systems that process text semantics in order to automatically fact-check claims, or ”speed read” text for efficient further classification. For music recommendation, we make three contributions: Firstly, a study of how the complexity of sequential music recommender methods relates to the diversity and relevance of the recommendations, and how diversification of recommendations can be used to control this trade-off. Secondly, we investigate how listening context impacts music consumption, which we use to motivate a new way of representing user profiles that captures sequential and contextual deviations from the user’s typical music preferences. Thirdly, we improve the prediction of music skip behaviour in a listening session based on past skips. For fact-checking, we make three contributions: Firstly, we construct the currently largest benchmark dataset of naturally occurring claims for training automatic fact-checking models. Secondly, we link and use eye-tracking data of humans reading news headlines to automatic fact-checking predictions. Thirdly, we present two models for detecting check-worthy sentences for fact-checking, which by the use of weak supervision and contrastive ranking, make steps towards better model generalization in a domain with very limited training data. Lastly, for speed reading, we contribute a new model that utilizes the inherent punctuation structure of text for learning how to ignore a large number of words, while being equally or more effective than processing every word in the text.

View publication

Contextual and Sequential User Embeddings for Large-Scale Music Recommendation

Abstract

Related

Zero-Shot Reranking with Large Language Models and Precomputed Ranking Features: Opportunities and Limitations

Link Me Baby One More Time: Social Music Discovery on Spotify

Policy-as-Pormpt: Rethinking Content Moderation in the Age of Large Language Models

Classification Of Spontaneous And Scripted Speech For Multilingual Audio