User Intents and Satisfaction with Slate Recommendations

An increasingly larger proportion of users rely on recommendation systems to pro-actively serve them recommendations based on diverse user needs and expectations. Developing a better understanding of how users interact with such recommender systems is important not only for improving user experience but also for developing satisfaction metrics for effective and efficient optimization of the recommendation algorithm. This is especially true in the case of online streaming services like Spotify, wherein the recommender system could gauge user satisfaction and adapt its recommendations to better serve users.

User interaction with such systems is often motivated by a specific need or intent, often not explicitly specified by the user, but can nevertheless inform on how the user interacts with, and the extent to which the user is satisfied by the recommendations served. We hypothesize that user interactions are conditional on the specific intent users have when interacting with a recommendation system, and highlight the need for explicitly considering user intent when interpreting interaction signals.

While search systems have access to explicit queries from users, to understand user intent, interpret interaction signals, and subsequently differentiate between success and failure, recommender systems, on the other hand, usually lack such explicit indicators of user intent and as a consequence, clear indicators of success. Indeed, the interpretation of signals varies with goals; for example, scrolling can indicate negative experience when the goal is to quickly listen to music now, but can also indicate a positive experience when the goal is to browse the diverse collection of music the system has to offer. Furthermore, interpreting interaction signals becomes especially hard in the context of complex recommendation settings, like Slate recommendations, a scenario typical to many music streaming services, where users are recommended a set of collections, called slates, with different purposes (to explore new music, or quickly jump to recently played music, etc), and heterogeneous content (playlists, artist profiles, other audio content, etc). Thus, there is a need for a detailed, holistic view of user interactions with such recommender systems, to establish their utility in predicting satisfaction.

In this work, we consider the use case of music streaming via slates of recommendation, and aim at understanding the relationship between interaction signals, user intents and user satisfaction. We present approaches to identify user intents and we demonstrate the importance of shared learning across intents and propose a multi-level hierarchical model for user satisfaction prediction that leverages user intent information alongside interaction signals.

Intent identification

We consider the case of users interacting with Spotify, and investigate the different interaction signals that can be extracted in slate recommendation settings, and how these interaction signals vary across different intents. Since the list of possible intents users might have is hitherto unknown, we adopt a mixed methods approach to understand user intents, and leverage insights from face-to-face interviews, large scale in-app survey, and non-parametric clustering techniques, to identify the list of possible intents. We identified eight user intents and verified their validity using large scale log data from Spotify.

We hypothesize that the way users interact with the recommended slates of playlists would differ across these intents. The figure below presents a heatmap of interaction signals with the different intents. We observe that the prominence of interaction signals significantly differs across intents, with certain signals like interaction time significantly lower for intent 2 (to quickly access music). These differences in interaction signals highlight the fact that users indeed behave differently when having different intents.

Predicting satisfaction

To understand and predict user satisfaction using interaction data, so far, we extracted detailed user interaction signals, and identified different intents users might have. We then leverage the extracted signals and intents and present techniques to predict user satisfaction using the signals. We experiment with three approaches for satisfaction prediction, covering the spectrum of intent-level granularity, i.e. global model for all intents to a separate model for each intent and a shared model across intents.

Global Model treats all user sessions as a homogeneous collection of data, with the user intent featuring simply as a categorical variable along with user interaction signals; see a) in Figure below.
Per-Intent Model trains a separate model for each intent; see b) in Figure below.

Multi-Level Model represents a compromise between the single global prediction model and a separate prediction model for each intent; see c) in Figure below.

attachment_67a447325c46a544bef30571f0a23a02

Both the Global and Per-Intent prediction models have both some limitations. The Per-Intent model works by using just the local intent specific information, and assumes that the data is sampled from separate models for each intent, thereby ignoring information and insights from other intents. Furthermore, it can be rendered futile for intents with small labeled data. By contrast, the Global model ignores intent-level variations in the user interaction data and inadvertently suppresses variations that can be important. Thus the idea of developing the Multi-level prediction model.

Modeling the satisfaction prediction model as a multi-level model offers several benefits. First, a multi-level model allows us to account for the intent level grouping of the user session. Second, it facilitates incorporating both individual session level as well as intent group level effects on user satisfaction, thereby allowing for variability in user interaction behavior across different intents. Third, by assuming that the intent group level effects come from a common distribution shared across all intents, it facilitates information sharing across different intents. This can help in improving the accuracy and predictive performance for intents with relatively little data.

We used real world user data from Spotify, consisting of 200K judgments about intents and satisfaction from 116K users, to evaluate the effectiveness of the three models (more detail can be found in our paper). Overall, we found that the Global Model was not able to predict user satisfaction much better than random. The Global Model without intent as a feature performs worse, which highlights that incorporating intent information is useful in predicting satisfaction. The Per-Intent Model, on the other hand, gave much better prediction results than the Global Model, further confirming the hypothesis that considering intent information is crucial in accurately understanding and predicting user satisfaction. Finally, the Multi-level Model performed significantly better than the Global Model with over 20% improvement in prediction accuracy over it. The Multi-level Model outperforms all but one Per-Intent Model, with performance improvement ranging from 4-14% in terms of prediction accuracy across different intents, while giving comparable performance to the last intent. These results show that the Multi-level Model is able to leverage insights from other intents, which in turn help boost the performance of other intents.

Importance of Interaction Signals per Intent

Finally, to gain insights into which interaction signal is most useful across different intents, we present the top three signals and report their relative feature weight across all eight intents in the Figure below. We observe that the signals are important to varying degrees across different intents. For the case where the user wants to quickly access saved music, signals like time to success and dwell time are most informative, while they are not informative for cases where the user wants to play music in the background. For intents where users wish to explore artists in more detail, signals involving the users building relationships (i.e. saved or downloaded tracks) are shown to be more important.

attachment_d1a42158d27ba69c35a92f8c9e1a7de2

attachment_77dcc8f721a9f56a45758014d2f8685d

The variation of the most informative signals across the different intents highlight the fact that different signals are indeed important to different extent for different use cases. These results clearly emphasize the need for modeling intent while interpreting user interactions for predicting user satisfaction.

Conclusions

Given the query-less paradigm of slate recommendations, it becomes non-trivial to understand user intents. Based on a mixed-methods approach composed of interviews, in-app survey and non-parametric clustering, we identified eight key user intents, and experimentally demonstrated the importance of explicitly considering these intents when predicting user satisfaction. Our results also indicate that different interaction signals are important to varying extent across intents. Furthermore, the significant improvement in prediction results advocates not only the need for grouping user sessions into intent groups for predicting satisfaction, but also for shared learning across all intents.

More information about the methodology, experiments and related work can be found in our paper:

Jointly Leveraging Intent and Interaction Signals to Predict User Satisfaction with Slate Recommendations. Rishabh Mehrotra, Mounia Lalmas, Doug Kenney, Thomas Lim-Meng, Golli Hashemian. The Web Conference 2019