Spillover Detection for Donor Selection in Synthetic Control Models

Introduction

There are many situations at Spotify where A/B tests can’t be performed: they could be too technically complex to implement, too damaging to the user experience, or they could infringe on a contractual agreement we have with a customer, to name a few. When A/B tests can’t be run, teams can’t measure the success or impact of their product or new release—potentially leading to a “ship and pray” situation.

In the absence of A/B tests, many methods have been developed to infer the causal impact of an intervention from observational data given certain assumptions. One of the most widely used causal inference approaches in economics, marketing, and medicine are synthetic control models. Synthetic controls have been described as “arguably the most important innovation in the policy evaluation literature in the last 15 years” [1] by Nobel Memorial Prize winning economist Guido Imbens, and even received a write up in the Washington Post under the great headline “Seriously, here’s one amazing math trick to learn what can’t be known” [2].

Despite their power and wide use, there are issues and limitations with current synthetic control methodologies that can make them challenging to use and gain confidence in and use at the scale required by large tech companies.

The problem

To concretely illustrate synthetic controls, let’s look at the recent launch of a new product on Spotify. Following the launch, we wanted to evaluate whether the product had a measurable impact on user engagement. Specifically, did the launch lead to an increase in daily active users (DAUs) for the markets where it was rolled out? The challenge, of course, is that we can’t observe what DAU trends would have looked like in those same markets had the product not been launched. We don’t have a traditional control group to compare against. That’s where synthetic controls come in. So how do they work?

Synthetic controls allow us to construct our own, synthetic control group using a weighted combination of other markets where the product wasn’t launched. Individually, no single market is a perfect comparison—but we can carefully choose weights for each of them so that the synthetic control matches the treated market's DAU trajectory before the product was introduced. This gives us a credible counterfactual: a data-driven estimate of what DAUs would have looked like had the product not been launched. From there, we can simply compare the actual DAUs in the treated market after launch to those of the synthetic control over the same period to estimate the impact.

As we’ve overviewed the basic functioning of synthetic controls models, we can now discuss the challenges and difficulties with the current methodology that our new approach to synthetic control solves.

Automated donor selection via forecasting and spillover detection

Firstly, in the above example, how do we choose the users whose weighted average forms the synthetic control? More generally speaking, the synthetic control method requires information from both the time evolution of the target unit (DAU in our example) as well as other correlated units, called donors, which crucially must not to be impacted by the intervention (corresponding to the product launch in our example). If a time-series is correlated with the target time-series, how can we determine if it is not impacted by the intervention, and hence constitutes a valid control donor we can use to build our synthetic control?

Domain knowledge is usually employed to make such a decision. But in some real-world situations there can be a large number of potential donor controls—such as when trying to estimate the impact of a new product on a large online platform like Spotify—and domain knowledge is not adequate for making the selection due to the scale of the problem.

We tackled this challenge and presented a data-driven approach to select valid donor controls that relies on weaker domain knowledge.

At the heart of our paper is a new theorem rooted in proximal causal inference. It shows that if:

The data-generating process is stable (more precisely, that the causal mechanisms underlying the evolution of the units in question are invariant),
And donor time series serve as proxies for the latent variables driving the time-series evolution of the units,

then we can forecast a donor’s post-intervention behavior using only pre-intervention data. If the forecast is wildly off, something’s wrong—likely a spillover.

This insight inverts the usual Synthetic Control logic. Instead of using post-intervention data to estimate our synthetic control counterfactual, we use the pre-intervention period to test the reliability of each donor.

The below plot shows how our automated donor selection models (S1 and S2 in the plot) compare to optimal selection (given knowledge of the true data generation mechanism) as well as random selection in terms of the bias they introduce in a synthetic example.

RS088 Spillover Detection for Donor Selection in Synthetic Control Models image2

To address concerns about false exclusions (excluding valid donors) or false inclusions (keeping contaminated ones), we provide a comprehensive sensitivity analysis framework that:

Bounds bias from omitted latent variables
Handles selection error due to excluded proxy donors
Models spillover as a sensitivity parameter to bound potential bias

These tools give practitioners confidence that even imperfect donor selections won't sabotage the causal inference.

Debiasing causal estimates due to noisy donors

Secondly, sometimes the valid donors selected to build the synthetic control model can be noisy, and this can lead to biased estimates of the causal impact. In terms of our product launch example, some of the users that make up our synthetic control could only share a limited correlation with users in the roll-out markets because they are very noisy representations of the underlying dynamics that govern the evolution of DAU over time in a given market. In such circumstances, our causal estimate can become biased. Can we improve and de-bias estimates in this situation?

We show how to use data from the non-selected, invalid donors from the pre-intervention time period to debias the causal impact estimates. Even when you successfully identify a set of “valid” donors (i.e., not affected by spillovers), these donors might still be noisy proxies for the underlying latent factors driving the outcome. As a result, your synthetic control model may be biased, especially if important latent information is missing.

This is where proximal causal inference comes in: even if some donors were excluded from the SC model (because they might have been affected), they still hold useful information in their pre-intervention data.

Why does this help?

Even if a donor is invalid post-intervention, its pre-intervention dynamics can still be correlated with the latent factors
Using these donors as proxies allows the model to adjust for latent confounding that the selected donors might not fully capture

The below plot shows the reduction in bias in a (from left figure to right figure) synthetic data example using our method.

RS088 Spillover Detection for Donor Selection in Synthetic Control Models image1

Conclusion

Our paper pushes the frontier of Synthetic Control models from “trust your domain knowledge” to “let the data guide you”. It's a scalable, theoretically sound, and practically implementable improvement that’s especially useful for large-scale applications like tech platforms, where donor pools are vast.

In summary, the key innovations are:

A novel theorem to detect spillovers via forecasting.
A practical algorithm for selecting valid donors.
Sensitivity bounds to protect against false exclusions/inclusions.
A method to use excluded donors for debiasing.
Empirical validation on both simulated and real-world data.

Check out the paper if you’re interested in learning more! Spillover detection for donor selection in synthetic control models Michael O'Riordan, Ciarán Gilligan-Lee Journal of Causal Inference, 2025

References

[1] Susan Athey & Guido Imbens, ``The State of Applied Econometrics - Causality and Policy Evaluation’’, 2016, https://arxiv.org/pdf/1607.00699 [2] Washington Post, “Seriously, here’s one amazing math trick to learn what can’t be known’’, https://www.washingtonpost.com/news/wonk/wp/2015/10/30/how-to-measure-things-in-a-world-of-competing-claims/, 2015 (accessed 2025)