Contextual and Sequential User Embeddings for Large-Scale Music Recommendation

Abstract

Sequential modelling entails making sense of sequential data, which naturally occurs in a wide array of domains. One example is systems that interact with users, log user actions and behaviour, and make recommendations of items of potential interest to users on the basis of their previous interactions. In such cases, the sequential order of user interactions is often indicative of what the user is interested in next. Similarly, for systems that automatically infer the semantics of text, capturing the sequential order of words in a sentence is essential, as even a slight re-ordering could significantly alter its original meaning. This thesis makes methodological contributions and new investigations of sequential modelling for the specific application areas of systems that recommend music tracks to listeners and systems that process text semantics in order to automatically fact-check claims, or ”speed read” text for efficient further classification. For music recommendation, we make three contributions: Firstly, a study of how the complexity of sequential music recommender methods relates to the diversity and relevance of the recommendations, and how diversification of recommendations can be used to control this trade-off. Secondly, we investigate how listening context impacts music consumption, which we use to motivate a new way of representing user profiles that captures sequential and contextual deviations from the user’s typical music preferences. Thirdly, we improve the prediction of music skip behaviour in a listening session based on past skips. For fact-checking, we make three contributions: Firstly, we construct the currently largest benchmark dataset of naturally occurring claims for training automatic fact-checking models. Secondly, we link and use eye-tracking data of humans reading news headlines to automatic fact-checking predictions. Thirdly, we present two models for detecting check-worthy sentences for fact-checking, which by the use of weak supervision and contrastive ranking, make steps towards better model generalization in a domain with very limited training data. Lastly, for speed reading, we contribute a new model that utilizes the inherent punctuation structure of text for learning how to ignore a large number of words, while being equally or more effective than processing every word in the text.

Related

April 2025 | 2024 IEEE Spoken Language Technology Workshop (SLT)

Classification Of Spontaneous And Scripted Speech For Multilingual Audio

Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz, Emmanouil Benetos

October 2024 | CIKM

PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

A. Ghazimatin, E. Garmash, G. Penha, K. Sheets, M. Achenbach, O. Semerci, R. Galvez, M. Tannenberg, S. Mantravadi, D. Narayanan, O. Kalaydzhyan, D. Cole, B. Carterette, A. Clifton, P. N. Bennett, C. Hauff, M. Lalmas-Roelleke

October 2024 | Journal of Online Trust & Safety

Algorithmic Impact Assessments at Scale: Practitioners’ Challenges and Needs

Amar Ashar, Karim Ginena, Maria Cipollone, Renata Barreto, Henriette Cramer