Mostra: Balancing multiple objectives for music recommendation

April 27, 2022 Published by Emanuele Bugliarello and Mounia Lalmas

Recommendation engines support most modern digital platforms, allowing users to navigate vast databases of products in Amazon, homes in AirBnB, videos in YouTube and music in Spotify. However, content in all these platforms is provided by creators, such as hosts in AirBnB and artists in Spotify, who play a crucial role in shaping the user experience. As such, it is crucial to support their exposure needs to ensure long-term engagement.

  1. In an upcoming paper at TheWebConf 2022, we study the multi-objective problem of how to best balance user, creator and platform needs for the long-term health and sustainability of Spotify. In particular:
  2. We show that there is a vast heterogeneity of multi-objective music streaming sessions.
  3. We find evidence from historical data that user satisfaction varies based on the type and amount of tracks that benefit multiple stakeholders.
  4. We propose Mostra: a new framework based on state-of-the-art Transformer technologies that we equip with counterfactual reasoning to allow system designers to dynamically and flexibly control the trade-offs across the different objectives based on the ever-evolving strategic needs.

Objectives in music platforms

The ultimate goal behind our work is to understand and leverage the trade-off across different goals for music recommendation, where the task consists of ordering a set of songs to meet multiple objectives. For this, we consider data from millions of radio sessions. To characterise the different stakeholders, we consider the following metrics for each streamed song:

  • SAT: Whether a user completely listened to the song. if a user did listen to it, they’re likely to like it! ?
  • Discovery: Whether a user has never listened to that song nor to any songs from the corresponding artist. This is what allows users to experience new music! ?
  • Exposure: Whether the song belongs to an emerging artist. This makes Spotify a sustainable platform to support a wide variety of creators! ?
  • Boosting: Whether the song belongs to a group that the platform is interested in boosting. It can be based on recent events, to align with certain trends, or to celebrate and honour a group of artists. ?

To better understand the interplay between the creator-centric objectives and users’ short-term satisfaction, and how frequent some of these objectives are within sessions, we consider a random sample of 100M listening sessions. As shown in the figure above, users enjoy songs being boosted or from emergent artists! However, if a given session contains too many discoveries, users tend to skip most of them, and so particular care must be taken in recommending new content to ensure maximal user satisfaction.

Interestingly, the plot above shows that there is a high diversity in the kind of tracks streamed by our listeners. What this means is that (i) there is more severe competition across objectives in certain sessions than others, and (ii) a dynamic, multi-objective recommendation engine is needed to adapt recommendations to each unique scenario.

Welcome to the Mostra

How can we build a recommender system that leverages all these goals in an effective way? This is an open question! In this paper, we propose a first step in this direction. Specifically, our focus is on building a dynamic system that can easily be tuned by system engineers on-the-fly to target specific objectives.

Mostra (Multi-objective Set Transformer), shown below, is our end-to-end neural network that combines state-of-the-art Transformer encoders with a novel beam search algorithm that selects the next song by reasoning about user satisfaction and the creators’ objectives.

Mostra works as follows:

  1. First, we encode each song in the pool using an encoder trained to maximise user satisfaction. This is what music recommendation systems are usually optimised for.
  2. Second, given an encoded representation for each song in the pool, we first tag each song that has a creator objective. 
  3. Then, instead of choosing the song that would maximise the training objective, we use a counterfactual step where tagged songs whose predicted score is within a small difference, epsilon, from the top one to be re-scored based on their creator-centric objectives.
    In particular, we use a submodular scoring function that maximises the diversity of objectives covered so far.
  4. Finally, if a song that is tagged with creator-centric objectives receives a higher score than one without any of those objectives, that song is given to the listener as the next track in their stream.

The power of Mostra is in allowing just-in-time, on-the-fly changes to the balancing of multiple objectives within the set. In fact, one needs not to re-encode a given track but rather only change either (i) the maximum difference of predicted scores allowed for re-ranking, (ii) the objectives that need to be considered at a given time, and/or (iii) their importance.


We compare Mostra with different threshold epsilon against a number of recommender approaches, including both relevance-based methods and state-of-the-art neural models. As shown in the table below, Mostra can improve upon them across all objectives, especially on the creator-centric ones.

We further study the behaviour of Mostra across various axes. In particular, we show that (i) Mostra can leverage multi-objective music sets with only minor losses for Discovery tracks (which we saw earlier are negatively correlated with SAT); (ii) Mostra not only recommends more creator-centric songs but those which are actually liked (i.e. fully listened to) by the listeners. 

In addition, the decoding algorithm of Mostra can be adapted to any available recommender system to make it a dynamic, just-in-time multi-objective engine! We show that our DNN baseline can benefit from it just as much as our Transformer-based Mostra model. 


We studied the problem of multi-objective recommendation, a key challenge in typical multi-stakeholder platforms such as Spotify. We first performed a comprehensive analysis of the different objectives and their interplay. Then, we proposed Mostra, a neural network that recommends songs based on various objectives that can easily be controlled to satisfy dynamic strategic needs through counterfactual decoding. Mostra achieves competitive performance on short-term user satisfaction whilst largely improving on other creator-centric objectives.

Check out our paper for many more details:
Mostra: A Flexible Balancing Framework to Trade-off User, Artist and Platform Objectives for Music Sequencing. 
Emanuele Bugliarello, Rishabh Mehrotra, James Kirk, and Mounia Lalmas
The Web Conference (WWW) 2022.