AudioBoost: Increasing Audiobook Retrievability in Spotify Search with Synthetic Query Generation

Spotify has recently introduced audiobooks as part of its catalog, complementing its music and podcast offering. An important goal is to help users discover this new content type and support the ability to explore the catalog by broadly searching by topic, genre, target audiences (e.g. “dark romance audiobooks”, “children’s literature audiobooks”, “motivational audiobooks”), so that users can discover new audiobooks, authors and publishers.
To find the most relevant items for a user, search engines rely on a variety of signals, including the user’s history and the item’s popularity. However, the estimation of the “true” item popularity is challenging in a cold-start scenario, as most listeners are not used to engaging with a new content type, and interactions are mostly concentrated on previously available items. This can negatively impact the retrievability, or the chance of being retrieved, [1] of new items.
To address this, we introduce AudioBoost, a system to boost audiobook retrievability in Spotify’s Search via synthetic query generation. AudioBoost tackles cold-start by simulating user queries for audiobooks using Large Language Models (LLMs).
Following up on previous research [2], AudioBoost uses synthetic query generation in two ways: 1) to inspire users to type more exploratory queries for audiobooks via Query AutoComplete (QAC) and 2) to “tag” audiobooks with relevant descriptors and queries and help the Search Retrieval engine retrieve the correct results.
Our offline and online results show that AudioBoost can increase the retrievability of audiobooks, leading to more impressions and clicks, effectively tackling the cold-start problem.

Figure 1: AudioBoost. Synthetic queries generated by an LLM are used as query completions to support query formulation (left) and for retrieval (right) to improve the retrievability of cold-start entities, i.e. audiobooks, in Spotify search (illustrative example).
AudioBoost Pipeline
To generate synthetic queries for audiobooks we first created a taxonomy containing different types of audiobook descriptors. To define such a taxonomy, we looked into how users search for audiobooks using both internal search queries and looking at web forums where users exchange book recommendations (for example on Reddit). For example, users might start a thread asking other users for “whimsical cheerful books to brighten your day”, revealing that descriptions of the style and the mood might serve as good queries.
This manual approach has led to a following taxonomy including 10 categories of descriptors such as Genres (e.g. “juvenile fiction”), Themes or topics (e.g. “greek philosophy” ), Characters Descriptions (e.g. “heroic protagonist”), Moods (e.g. “adventurous”), Target audiences (e.g. “children’s literature.”).
Based on this taxonomy, we use a Chain-of-Thought (CoT) prompting approach to generate queries. Before asking the LLM to generate queries, we first ask it to generate descriptors under the 10 categories of this taxonomy and then to use such descriptors to generate two types of synthetic queries: 1) Queries, e.g. “realistic fiction audiobooks” (2) Compound queries, e.g. “Stephen King supernatural fiction audiobooks”. The metadata used for each audiobook as input to the model is the audiobook title, audiobook author(s), audiobook description, and BISAC genres [3].
Synthetic queries are indexed both in the QAC system as query completions and as additional “tags” in the sparse search retrieval system (document augmentation).

Figure 2: AudioBoost pipeline a) We collect metadata about a given audiobook and add it to the LLM prompt b) we use a pre-trained LLM with chain-of-thought prompting to generate first audiobook descriptors and then synthetic queries c) we store the generated synthetic queries in a table d) we index synthetic queries as query completions and as document augmentations for enhanced sparse retrieval
Results
We run both offline evaluations and online evaluations of the AudioBoost Pipeline.
Retrievability Simulation
In this experiment, we simulate the change in retrievability share of a content type when using the Audioboost pipeline.
We have four different configurations in this experiment:
Configuration 1: baseline, synthetic queries are not used.
Configuration 2: synthetic queries are only used for document augmentation
Configuration 3: synthetic queries are only used for query completion
Configuration 4: synthetic queries are used in both query suggestion and document expansion.
We define the retrievability of an item as how often it appears in the top 100 search results, and the retrievability share of a content type as the percentage of total retrievability associated with that content type.
As a dataset, we used a sample of data containing search successes from user logs. Specifically, we took a sample of query and entity pairs from a single day for three entity types at the platform: audiobooks, playlists, and podcast shows. For query completions, we assume that they would be clicked by at least one user.
Figure 3 shows the results of this simulation. In configuration 2, some queries from the logs that would retrieve playlists or shows now return more audiobooks. In configuration 3, we change the query distribution to have more audiobook-focused queries in the system (simulating the effect of query completions), and thus we can increase the number of audiobooks retrieved, as some of the synthetic queries will match with the title, description, and genre metadata already available. The most effective approach though is a combination of both helping users to issue more broad audiobook queries and also modifying the retrieval system so that such queries lead to audiobooks (configuration 4).

Figure 3: Offline simulation showing the impact on retrievability share of audiobooks when increasing the number of clicks towards suggested synthetic query completions.
Online Results
We run a large-scale A/B test for 3 weeks comparing the default QAC and retrieval system to a treatment that uses AudioBoost, namely where we 1) add the synthetic queries as an additional source of query completions and 2) use the synthetic queries to perform document augmentation in a sparse retrieval system (Fig. 2).
We measure several metrics online to account for audiobook retrievability and exploratory searches for audiobooks at the SERP (Search Engine Results Page) level:
impressions: number of audiobook impressions per SERP
clicks: number of clicks on audiobooks per SERP
coverage: overall number of query completions shown per SERP
exploration: number of clicks on exploratory query completions leading to audiobook interactions per SERP
We observe that AudioBoost leads to +0.7% in impressions, +1.22% in clicks, +0.03% in coverage, and +1.82% in exploration. At the same time, guardrail metrics that check for the overall engagement with QAC and overall search effectiveness are neutral. All reported results are statistically significant with a t-test with a p-value of 1%.
Conclusions
In this post, we introduced AudioBoost, a system to address cold-start in Search via synthetic query generation. We observe that AudioBoost boosts audiobook retrievability and increases impressions and clicks on audiobooks, inspiring users to explore the catalog and helping authors and publishers gain visibility and engage with new listeners. This work highlights that synthetic query generation is a powerful strategy to increase the visibility of a set of target items in a search system, in accordance with what we saw in previous work [2]. AudioBoost is also appealing from the productionalization point of view, as the query generation and indexing steps can be performed offline in a batch pipeline, with reasonable cost and without affecting the latency of the QAC and retrieval systems. Given these results, we have rolled out AudioBoost in production.
For more information, please refer to our paper: AudioBoost: Increasing Audiobook Retrievability in Spotify Search with Synthetic Query Generation Enrico Palumbo, Gustavo Penha, Alva Liu, Marcus Eltscheminov, Jefferson Carvalho dos Santos, Alice Wang, Hugues Bouchard, Humberto Jesús Corona Pampin, Michelle Tran Luu RecSys EARL Workshop
References
[1] Leif Azzopardi and Vishwa Vinay. Retrievability: An evaluation measure for higher order information access tasks. ACM CIKM, 2008. [2] Gustavo Penha, Enrico Palumbo, Maryam Aziz, Alice Wang, and Hugues Bouchard. Improving Content Retrievability in Search with Controllable Query Generation. ACM Web Conference 2023. [3] BISAC Subject Code, Book Industry Study Group, https://www.bisg.org/BISAC-Subject-Codes-main