ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics

For product leaders at Spotify and other web-facing companies, making informed decisions about product changes is a constant challenge. We are always striving to balance customer satisfaction, business objectives, and profitability. While A/B testing is our go-to for rigorous evaluation, it has limits—especially when we need to understand long-term impacts, navigate complex interactions between metrics, or quickly explore a wide array of "what-if" scenarios. That is where ForTune comes in. Our paper, "ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics," published in the 2025 ACM Conference on Knowledge Discovery in Databases (KDD’25), introduces a novel, lightweight approach designed to bridge this crucial gap, allowing us to gain insights into potential experiment outcomes before we even think about deploying them. This means faster iteration and more confident decision-making.

At its heart, ForTune is about simulating the future by intelligently re-weighting the past. Instead of building complex predictive models, we focus on taking our existing historical behavioral data and adjusting it to reflect a hypothetical future state. Imagine we want to understand what happens to business metrics if a certain user behavior metric changes by a specific amount—ForTune helps us answer that. To ensure our simulations are robust and do not just rely on a handful of data points, ForTune formulates this re-weighting as a constrained convex optimization problem. Essentially, we maximize the "spread" of sample weights, while making sure they adhere to the specific conditions of our hypothetical scenario.

The Algorithm

More formally, the ForTune Algorithm maximizes the entropy of the sample weights:

Subject to M equality constraints on scenario metrics m_j (j in 1..M)

K inequality constraints on scenario metrics m_k (k in 1..K)

where P_k is a multiplier on the average metric value, and sample weight constraints

Once we find the optimal weights, we can easily estimate the impact of the scenario on any metric of interest by taking a weighted sum of the original metric values:

To give us a sense of confidence, we use bootstrapping to quantify the uncertainty around these estimates.

ForTune at Spotify

For us at Spotify, ForTune has proven invaluable across a variety of recommendation surfaces and scenarios. We have seen its directional predictions align with observed results in the vast majority of cases. This has allowed our teams to evaluate how various consumption shifts might impact different business metrics (as illustrated in Figure 1) and even quantify complex trade-offs, like how changes in music consumption might interact with content discovery to affect user satisfaction (Figure 2).

Figure 1: Results of a scenario on change in overall consumption on the Spotify app. Testing percent changes of -3 up to 3, we see an inverse relationship with an anonymized business metric, and we have high confidence in that relationship due to the non-overlapping error distributions.

Figure 2: A scenario analyzing tradeoffs between overall consumption of music on the Spotify app, rate of music discovery, and an anonymized user engagement metric. Each cell shows the change in the user satisfaction metric under the scenario in which music consumption and discovery change by the amounts on the x and y axes respectively.

While ForTune is a powerful addition to our toolkit for rapid offline scenario analysis, we are also transparent about its limitations. The accuracy of the predictions is highly dependent on how carefully we define our constraints and whether we include all relevant variables. It is also important to remember that ForTune predicts averages, not absolute values, and like any observational study, it is subject to potential confounding factors. However, its lightweight nature, ease of interpretability, and ability to explore a wide range of "what-if" scenarios at a low cost make it an exceptional complement to our online experimentation efforts. It empowers our product leads to develop more informed hypotheses and proactively anticipate the potential impact of their decisions on key business metrics, driving better outcomes for our users and for Spotify.

For more information, please refer to our paper: ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics Authors: Georges Dupret, Konstantin Sozinov, Carmen Barcena Gonzalez, Ziggy Zacks, Amber Yuan, Ben Carterette, Manuel Mai, Andrey Gatash, Leo Lien, Shubham Bansal, Roberto Sanchis-Ojeda, Mounia Lalmas-Roelleke Corresponding author: Ben Carterette; benjaminc@spotify.com KDD’25: ACM Conference on Knowledge Discovery in Databases