Neural Instant Search for Music and Podcasts

August 30, 2021 Published by Helia Hashemi, Aasish Pappu, Praveen Ravichandran, Mi Tian, Mounia Lalmas, Ben Carterette

Tl;dr:  Music and podcast search are different, and a carefully designed multi-task model will be essential for search in modern audio streaming platforms.

Spotify has a vast catalog of music and podcasts.  It is so vast that without advanced search and recommendation technologies, users could be overwhelmed by the selection. Search and recommendations help listeners find the audio that is most meaningful to them more quickly.

Spotify’s search is defined by a few requirements:

  • Instantaneous: each keystroke must update search results
  • Heterogeneity: users may be searching for either music or podcasts
  • Multirepresentation: text matching is not enough for high-quality results

Developing a search solution that meets these requirements is essential for reducing friction and helping listeners get to the audio they want quickly, but at the same time these criteria create constraints that make it difficult to optimally give listeners the right audio at the right time.

We investigated how we could build a better search engine for a listener base with a variety of needs while maintaining high performance on the three requirements above.

How is music search behavior different from podcast search?

We analyzed query log data to explore the differences in how users search for and consume podcasts compared to music.

How much effort do users put into finding music versus podcasts?

It turns out that they spend significantly more time inputting and rewriting queries to find podcasts as compared to music.  The table below shows a large increase in character deletions and a significant increase in character length of queries that are looking for podcasts versus music.  (There is a negligible increase in query length in words.)

Average relative character deletions+53%
Average query length in characters+13%
Average query length in words+0%

How do users consume music versus podcasts in search results? 

Users searching for podcasts are far more likely to perform actions that reflect saving content for later consumption—in particular, they download podcast episodes nearly 5 times as often as they download music!  But we also see an increase in immediate streams, which may suggest that more podcast listeners than music listeners are coming to search with the intent to listen immediately.

Stream+3%
Add to collection+30%
Add to playlist-59%
Follow artist-92%
Download+593%
Share+44%

Can we develop a search model to provide quick access to both music and podcasts?

Recall the requirements for a search model:

  • Instantaneous: each keystroke must update search results
  • Heterogeneity: users may be searching for either music or podcasts
  • Multirepresentation: text matching is not enough for high-quality results

We developed a neural architecture that we call Neural Instant Search (NIS) to satisfy each of these requirements in turn in light of the behavior analysis:

  • We introduce character-level embedding vectors to the model. This enables the model to match the input query prefix to the item description.
  • We introduce query intent type identification objectives to the network’s optimization using multi-task learning.
  • We introduce an item embedding vector that is independent of its title and is modeled as a look-up table for all items in the collection.

The architecture is shown in the following figure:

The model outperforms strong baselines for music and podcast search tasks, as shown in the following figure. NIS gives better relevance for both music and podcast searches than other popular neural architectures.

What’s next?

Our model helps more listeners find the right audio at the right time with less friction by leveraging understanding of query intents and character-level encoding, benefitting both listeners and the creators of the audio they love. With less time spent searching, our listeners can spend more time listening to, and connect with the music and podcasts they love.

For more information we refer the reader to our paper published at the ACM KDD 2021:

“Neural Instant Search for Music and Podcast”. Helia Hashemi, Aasish Pappu, Mi Tian, Praveen Chandar, Mounia Lalmas, Ben Carterette.  International Conference on Knowledge Discovery and Data Mining (KDD) 2021.