Beyond the Next Track: Spotify Research at RecSys 2025

At Spotify, we believe recommendation is about more than simply choosing the next track. It is about guiding listeners through a vast world of content, giving them exactly what they need in the moment, while also delighting them with discoveries they did not even know they were looking for. In doing so, we not only enrich the listening experience, but also help emerging and established creators and entire formats find and grow their audiences.

For RecSys 2025 in Prague, we are proud to share eight accepted contributions that advance the boundaries of recommender systems. They span three key themes:

Agentic Search, Discovery & Representation – exploring how listeners can both find and serendipitously discover content across Spotify’s growing ecosystem, powered by innovations in agentic AI, synthetic data generation, and contextual calibration.
Generative & Multimodal Models – reimagining the building blocks of recommendation through diffusion models, semantic IDs, and multimodal LLMs.
Evaluation & Alignment – pioneering new ways to measure quality, making LLM-as-a-judge more personalized and better aligned with human preferences.

These themes highlight three key ways Spotify Research drives impact. The first showcases our successful tech transfer: innovations already deployed at global scale, delivering measurable improvements in online metrics. The second explores the frontier of what could shape tomorrow’s deployments. The third demonstrates how research advances the field itself, through new methodologies, in this case, rethinking how we evaluate recommendations.

We’re also proud to be a Bronze sponsor of the conference. Our band members remain deeply engaged in the community, serving on the executive committee, the Industry Track program committee, and as reviewers.

Keep reading to learn more about our contributions.

Agentic Search, Discovery & Representation

We help listeners not only find what they want, but also discover what they didn’t yet know they were looking for, all while building foundational representations that make personalization more scalable and adaptive. Each of these papers reflects work already deployed in production, driving measurable improvements on key metrics.

You Say Search, I Say Recs: A Scalable Agentic Approach to Exploratory Search – Introduces an LLM router that interprets broad or ambiguous queries and dynamically routes them to search or recommendation systems, yielding large gains in discovery. With positive online results on targeted use-cases and thanks to innovation in making the system scalable and low latency it has been successfully deployed.
- Tuesday | Paper Session 3: Representation Meets Recommendation & Search
AudioBoost: Increasing Audiobook Retrievability in Search – Tackles the cold-start challenge for audiobooks using LLM-generated synthetic queries, boosting retrievability and inspiring more exploratory audiobook searches. With improvements in coverage and engagement for audiobook queries this has been fully deployed in production.
- Friday | EARL Workshop
Calibrated Recommendations with Contextual Bandits – Learns the right balance of music, podcasts, and audiobooks for each user in each moment on the Spotify home page, surfacing content that both matches intent and introduces serendipity. This work has been deployed making each impression more effective and achieving uplifts in consumption and activation.
- Monday | CONSEQUENCES workshop
Generalized User Representations for Large-Scale Recommendation and Downstream Tasks – Provides a two-stage framework for shared user embeddings, enabling better cold-start handling, improved discovery, and efficiency across multiple recommendation tasks, while also allowing for responsive and highly scalable deployment.
- Wednesday | Industry Poster Session

Generative & Multimodal Models

Generative AI offers new opportunities to reimagine recommender systems, while also challenging us to innovate in their foundational building blocks. How can we harness new classes of models to deliver qualitatively different recommendations? And how can we represent content in ways that unlock the full potential of generative models, capturing its richness and nuance more effectively?

Prompt-to-Slate: Diffusion Models for Prompt-Conditioned Slate Generation – Uses diffusion models to generate entire playlists or bundles directly from natural-language prompts, with A/B tests showing higher engagement and diversity.
- Thursday | Paper Session 9: Signals We Trust: Offline, Online, and Real World Evaluation of Recommender Systems
Semantic IDs for Joint Generative Search and Recommendation – Demonstrates how multi-task optimization of Semantic IDs effectively unify representations across search and recommendation tasks, overcoming performance trade-offs and advancing fully generative, task-agnostic models.
- Thursday | Poster Session: LBR
Describe What You See with Multimodal LLMs for Video Recommendations – Shows that current video recommendation models can be enriched with off-she-shelf non-finetuned Multimodal LLMs, thanks to their ability to capture intent, humor, and cultural cues.
- Thursday | Poster Session: LBR

Evaluation & Alignment

LLM-as-a-judge is rapidly becoming a standard evaluation tool across both academia and industry. In fact, two of the papers above (You Say Search, I Say Recs and AudioBoost) rely on it for assessment. In this work, we push the boundaries further by making LLM judges personalized and profile-aware, bringing them closer to real user preferences. More broadly, this reflects how research drives impact through methodological innovation..

Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge – Uses natural-language profiles distilled from listening histories to guide LLMs in assessing podcast recommendations. This approach goes beyond offline metrics and scales evaluation in a way that is interpretable, personalized, and closely aligned with human judgments.
- Thursday | Poster Session: LBR

Closing

We are excited to share these contributions with the RecSys community, and to continue the conversation about where recommender systems are headed next. If you will be at RecSys 2025 in Prague, we would love to connect; we will have a booth, so just come by and say hi!

Stay tuned for deeper dives into many of these papers on the Spotify Research blog.