Transforming AI Research into Personalized Listening: Spotify at NeurIPS 2025

Spotify is proud to sponsor NeurIPS 2025, a conference that has long been at the forefront of exploring and redefining what is possible in AI. Our band members are deeply involved in this vibrant community, serving as senior program committee members, workshop organizers, and reviewers. In the fast-evolving world of streaming, Spotify continues to push boundaries, reimagining how listeners experience sound and connection.

At the heart of this transformation is a strong culture of research and product co-innovation, from cutting-edge generative AI research to live features that reshape the listening experience. In this blog post, we share how Spotify brings this vision to life and highlight four real examples of generative and agentic AI experience already available to users in some markets today.

This work builds on Spotify’s broader exploration of generative recommender systems, models that not only understand what users like but can also generate new pathways of discovery. This includes ongoing work on teaching LLMs to understand Spotify catalog, where we explore how catalog-native representations and generative modeling enable more natural, explainable, and adaptive personalization. (Teaching Large Language Models to Speak Spotify: How Semantic IDs Enable Personalization)

For those attending NeurIPS 2025, you'll be able to experience the applications we're highlighting in even more detail. Make sure to stop by our booth to learn more about Spotify work in generative AI and tell us what excites you about Spotify's vision.

Research + Engineering = Product: The Spotify Playbook

At Spotify we continuously innovate by combining rapid experimentation with scalable deployment. It starts with identifying opportunities to improve how users engage with content, whether discovering music, making requests, or resuming stories, and building the systems that enable those experiences. In each case, we ask: are we building toward the technology of the future? For AI-focused systems, some of our key principles include:

Agentic systems: Moving beyond static recommenders to interactive AI agents that understand user intent, use tools, and deliver personalized outcomes.
Generative preference-aligned models: Combining LLM-based orchestration with preference optimization lets Spotify adapt dynamically to user taste and catalog changes.
Closed-loop feedback: Every skip, save, and play request refines the system through preference optimization.
Scalability and latency: Our research focuses on sub-second responsiveness and reliability for millions of users.

These four principles guide how we translate AI research into product innovation shaping the foundation of our personalized experiences. They are reflected in our recent research, including Personalizing Agentic AI to Users' Musical Tastes with Scalable Preference Optimization | Spotify Research and You Say Search, I Say Recs: A Scalable Agentic Approach to Query Understanding and Exploratory Search at Spotify, and also formed the foundation of the four applications highlighted below.

Application #1: Voice and Text-Driven DJ Requests

One of the most visible outcomes of our research + engineering innovation approach is the ability of DJ to take voice and written requests, which lets listeners ask for music just by saying what they want to hear

“DJ is now taking music requests, giving Premium users in more than 60 markets an entirely new way to curate the vibe of their listening sessions in real time.” (Spotify’s DJ Now Takes Requests, Enhancing Real-Time Music Discovery)

The system interprets natural-language prompts like “Play me some electronic beats for a midday run” and maps them to moods and genres, filters for discovery, and builds a personalized and responsive playlist that updates as users interact. This technology applies agentic AI: models that understand intent, use tools, and continuously refine results.

Application #2: AI Playlist - Personalized Agentic Recommendations

Spotify recently added preference alignment to its AI Playlist experience (Spotify Expands AI Playlist in Beta to Premium Listeners in 40+ New Markets), where users can describe a vibe, moment, or feeling, and an AI system builds a custom playlist in seconds. To do this, the system uses agentic AI orchestration with an LLM-based agent that interprets user intent and calls multiple tools to generate tailored playlists.

The underlying models use a hybrid RLHF/DPO approach with a reward model to estimate satisfaction and a Direct Preference Optimization (DPO) framework that continuously fine-tunes based on user behavior such as skip, replay, and save. This creates a “preference tuning flywheel”: generate → score → sample → fine-tune.

In live experiments, this framework increased listening time by 4%, improved playlist saves, and reduced tool errors by 70%. This represents a major step forward in scaling personalized listening experiences built around natural language and dynamic user preferences.

Application #3: Spotify × Ray-Ban Meta Integration - Ambient AI in Everyday Listening

Spotify’s integration with Ray-Ban Meta glasses (Ray-Ban | Meta Glasses Are Getting New AI Features and More Partner Integrations) brings its generative AI research beyond the app and into the real world. By combining agentic AI systems with hands-free voice capabilities, this collaboration lets listeners interact with Spotify naturally through spoken prompts, contextual awareness, and seamless playback control.

Once a user has their Spotify account linked, they can say a phrase like, “Play something chill for my walk,” and the glasses’ on-device assistant will connect with Spotify’s agentic recommendation stack. This is the same AI foundation behind features like DJ and AI Playlist that enables real-time intent detection, music retrieval, and preference tuning.

Behind the scenes, this integration uses:

Speech-to-intent modeling from Spotify’s voice research, for low-latency conversational responses.
Context-adaptive models that consider cues like location or time of day to personalize playback.
Edge-optimized inference, a scaled-down version of Spotify’s generative models running efficiently on wearables.

Together, these innovations bring a new kind of ambient listening, where the line between everyday life and audio interaction disappears.

Application #4: Recaps for Audiobook Listening

Spotify is also applying AI to long-form content with audiobook Recaps, a feature that summarizes what listeners have heard so far. (How Recaps Are Helping Audiobook Fans Stay Connected)

The goal is to make audiobook listening smoother, letting users pick up where they left off, even after long breaks. The summaries use AI to help listeners recall key moments without re-listening, while maintaining the original narrative’s tone and pacing. This shows how Spotify is extending its AI research beyond music into all forms of audio storytelling.

Some final words

From the lab to the listener’s ear, Spotify shows how research-driven AI transforms the audio experience. Whether making a voice request to your DJ, creating a custom AI Playlist from a mood prompt, discovering music through wearable AI, or returning to an audiobook seamlessly, each feature builds on a shared foundation of generative and agentic AI research. Spotify is creating new ways to listen through continuous innovation that bridges science and sound.

Transforming AI Research into Personalized Listening: Spotify at NeurIPS 2025

Research + Engineering = Product: The Spotify Playbook

Application #1: Voice and Text-Driven DJ Requests

Application #2: AI Playlist - Personalized Agentic Recommendations

Application #3: Spotify × Ray-Ban Meta Integration - Ambient AI in Everyday Listening

Application #4: Recaps for Audiobook Listening

Some final words

Related articles

Teaching Large Language Models to Speak Spotify: How Semantic IDs Enable Personalization

Learning Personalised Prices in Ad Auctions with Game Theory and Deep Learning

Personalizing Agentic AI to Users' Musical Tastes with Scalable Preference Optimization

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations