Challenges in Translating Research to Practice for Evaluating Fairness & Bias in Recommendation Systems
Lex Beattie, Dan Taber, Henriette Cramer
Online controlled experiments, in which different variants of a product are compared based on an Overall Evaluation Criterion (OEC), have emerged as a gold standard for decision making in online services. It is vital that the OEC is aligned with the overall goal of stakeholders for effective decision making. However, this is a challenge when the overall goal is not immediately observable. For instance, we might want to understand the effect of deploying a feature on long-term retention, where the outcome (retention) is not observable at the end of an A/B test. In this work, we examine long-term user engagement outcomes as a time-to-event problem and demonstrate the use of survival models for estimating long-term effects. We then discuss the practical challenges in using time-to-event metrics for decision making in online experiments. We propose a simple churn-based time-to-inactivity metric and describe a framework for developing & validating modeled metrics using survival models for predicting long-term retention. Then, we present a case study and provide practical guidelines on developing and evaluating a time-to-churn metric on a large scale real-world dataset of online experiments. Finally, we compare the proposed approach to existing alternatives in terms of sensitivity and directionality.
Lex Beattie, Dan Taber, Henriette Cramer
Maryam Aziz, Jesse Anderton, Kevin Jamieson, Alice Wang, Hugues Bouchard, Javed Aslam
Rezvaneh Rezapour, Sravana Reddy, Ann Clifton, Rosie Jones