Redesigning Your Library with Mixed Methods Insights

June 22, 2023 Published by Ingrid Pettersson, Carl Fredriksson, Raha Dadgar, John Richardson, Lisa Shields, Duncan McKenzie

Launching a radical change to a habitual feature, used by millions every day, presents an interesting challenge. It is a challenge both to the end user whose habits are disrupted, and to the organization making the change. But at times major changes are needed to encompass long-term feature growth to the benefit of users. This was recently the case with Spotify’s Your Library, tasked with enabling a growing selection of content types. In 2020, podcasts were introduced to the platform, and in 2022 audiobooks were added for the US market. Our goal was to enable this content growth while minimizing the negative effects on the user experience, as measured by engagement metrics from changes, in a space loved and used by many.

The challenge required a high degree of sensitivity to users’ needs within what is often considered their space in the world of streaming. To face this challenge, we worked closely across data science and user research through a mixed methods approach. In the paper Minimizing change aversion through mixed methods research: a case study of redesigning Spotify’s Your Library, published at the CHI conference in 2023, we reflect on how data scientists and user researchers collaborated during the redesign of Your Library launched in 2021.

Approach to minimizing change aversion

We want to highlight three of the factors that we believe enabled a successful launch of the redesigned Your Library: first the early involvement of data science and user research disciplines in the product development process to understand key behaviors and user mental models; second a modified approach to evaluating disruptive changes at scale in a sensitive space; and third a focus on quantitative and qualitative mixed methods insight work throughout the whole process to improve the experience iteratively.

Early involvement

“My Spotify Library represents me very much – better than my wardrobe.” (quote from interview study)

To be able to successfully launch a new experience we needed to understand users’ current mental models and habits in regards to the library. We needed to learn this as early as possible as it set the foundation for changes to come.

From data explorations in this phase, we observed key behavioral patterns centered around the Library. Specifically, we were interested in establishing an understanding of the saving and retrieval patterns for different audio format types on the platform, forming a baseline understanding to compare upcoming changes to.

An ethnographic user research study was conducted together with our Product and Design counterparts early in the development cycle. We set out to understand users’ experiences and mental models ascribed to the library by visiting 18 representative users at home in the US and observing different attitudes and behaviors in regards to library organization.

Similar to how different and personal one’s wardrobe is organized, we observed very different attitudes and behaviors in relation to Spotify library organization. Some were keeping their collections in strict order, some were gathering large amounts of content to revisit later and others were in between. However, the foundational mental model of the Library as “their space” in a world of recommendations was consistent. The playlists they create and store can hold a lot of pride, identity and nostalgic value to them.

This learning emphasized the importance of including users’ own library content in the process of evaluating new solutions, which led us to using personal prototypes where users could see and react to their own content in the new library.

As we continued to work closely with our design and product peers, we were able to identify key hypotheses that needed to be addressed early and put them to test through rounds of concept testing. In the iterative usability studies, we assessed if users were still able to reach the content they love and could navigate in the new library experience.

Evaluating safely at scale

When we arrived at a new design for the Library which we were fairly confident in based on iterative evaluative user research, we decided to start evaluating at scale. Evaluating at scale comes with higher risk and lower ability to capture detailed nuance, but it enables us to learn from a wider group of users and rank the prevalence of different observed phenomena more reliably. Evaluating at scale is common at Spotify, often in the form of running A/B tests. Due to how extensively this redesign would affect a space that is very personal to users, we decided to spend extra time and effort at this stage of the project by running a beta test and a multi-step A/B testing process.

Beta testing

We started by running an opt-in beta test where a small percentage of users had the opportunity to try the new experience and provide feedback in the form of text and ratings. This allowed us to track user behaviors and gather a large amount of text feedback from users who experienced the new Library over an extended period of time, in a real life setting with their own collections. The main reason for running a beta test before A/B testing the redesign was to have an easy way of gathering feedback from users, without risking negatively impacting a larger number of users. Additionally, the users that opted in did so by choice and were not forced into a new experience.

From the beta test, we learned about the most important pain points at scale and we used this information to iterate on the design, to then further evaluate via a multi-step A/B testing process. The beta feedback provided direction on what changes to make before launching the experience to all users. The beta test only provided observational data, notably from self-selected users, which could not be used to reliably answer the key causal question of “how will our key metrics be affected if we launch the new experience to the majority of our users?” Answering this question was of utmost importance before the new experience could be launched, and thus the next step was to start A/B testing.

A/B testing

After the beta test we were confident to progress with A/B tests. The goal of the A/B testing process was to guide the team on additional changes to make before launch and ultimately to evaluate whether the experience was ready for a global launch. Key metrics would be evaluated in the planned tests. These metrics were divided into two groups: guardrail metrics that we did not aim to improve, but had to be kept within desired margins before launching, and success metrics that we hoped to improve with the new design. The guardrail metrics focused on listening time and retention for the mobile application. The success metrics were more local and focused on content retrieval from the Library.

The testing plan started out with a small-scale A/B test with the goal of providing guidance on whether a new feature, aimed to address usability issues, should be included in the new Library design or not, and to estimate the impact of the new Library on guardrail metrics. If the test analysis did not indicate signs of negative impact to the guardrail metrics, the plan was to move on to a larger A/B test with more statistical power. If we observed signs of negative impact, the plan was to pivot and run another small test after additional changes were made.

The small-scale test was successful and confirmed that we should include the new feature. We moved on to a larger A/B test for determining whether the redesign was ready for a global launch. Since the new Library enabled additional audio types, we were ready to accept some potential negative impact to the guardrail metrics in the short term. However, the potential impact had to be kept within acceptable limits. To manage this, we utilized non-inferiority testing, which is a type of statistical test where the aim is to show that a treatment (new Library) is not unacceptably worse than control (old Library) [1]. Non-inferiority margins (NIMs) are used to formalize what unacceptably worse is. The main reason for running a larger A/B test before launch was to increase the sample size enough to statistically power the desired NIMs. The larger test was also successful and we started to prepare for a global launch to all users, with additional learnings from qualitative research in consideration for continued development.

Mixed method studies

Using quantitative methods together with qualitative methods can offer an understanding of both the size and the underlying reasons of users’ behaviors and actions. It’s a fruitful and well established way of working at Spotify [2]. Through restructuring the insights team to enable longer periods of focus, we paired user research and data science on a longer than typical time frame, to enable a prologned holistic understanding of the Library user experience and to continuously deliver combined insights. By synchronous qualitative research on the experiences, we could identify where to adapt and refine the experience, providing clear recommendations and holistic status reporting to the team. This helped us to improve the new library experience in iterations.

An example of working with mixed methods was within the Beta test; at the same time as behavioral data was collected from the test, we were using interviews with users to identify key remaining issues. We also enabled rating and text feedback in the test to understand these issues at scale. From the research we for example learned that although a relatively small number of users were frustrated with changes to sorting, they were very passionate and vocal about this change, leading us to reconsider the sorting options.

The A/B test provided yet another opportunity to assess the experience with attention to both the metrics and an understanding of the users’ motivations and needs. In combination with the final test, we ran a diary and interview study to dig deeper into understanding remaining pain-points. We gave our participants access to the new library for a week, with regular check-ins every other day and a final longer interview at the end of the week. This gave them the opportunity to see their own content in the new experience and ensured that we did not react to a person’s subjective first responses to using a technology. During the diary study, the participants were encouraged to explore their new Library and continue with their habitual listening. The interviews from this diary study gave further valuable in-depth understanding of reasons behind the pain points we had previously detected in the beta testing phase, which allowed us to go back to the drawing board with more confidence and re-think aspects of the experience. Examples of redesigns were introducing a new podcast collection format as well as redesigning the retrieval experience and sorting options in the Library.

Additionally, the diary study and interviews revealed interesting findings regarding the core user needs identified in the early research, specifically the limits of users’ interest in spending effort manually organizing their libraries.

Conclusion

To recap, we wanted to minimize change aversion while launching a disruptive change to a space users feel a uniquely strong ownership of and where disruption could cause large effects to consumption habits. Some disruption would be inevitable, but we strove to lower it as much as possible, through iterative mitigation of risks. Our approach was to:

Get involved in the product cycle early and work closely with our design and product peers to gain understanding and align on the fundamental user needs and success metrics. This included an ethnographic study and data explorations.
Evaluate disruptive concepts at scale through a beta test and A/B tests in combination with qualitative research to make iterative improvements and launch with confidence.
Combine qualitative and quantitative research at all stages to gain a deeper understanding of both what and why.

Once we were certain through qualitative and quantitative research that the feature would not disrupt key habits, we launched the experience to all users to avoid scattered experiences across the user base. We found that we did not critically disrupt users – they were able to find and consume their collections, as well as make use of the new features added for quicker retrieval. Negative sentiment did exist for the new update, as is expected for changes, but it was substantially lower compared to the previous Your Library update.

While the project was largely a success, it was not perfect, and there are several things we would do differently if we were to do it all over again. For example, we would have been more directional in our Beta feedback questions. We would have worked with personalized prototypes from the start of the concept evaluations, to enable more personal experiences of the new experience.

We also want to acknowledge that this approach, including the Beta test, takes a lot of time and effort, and although we felt it was time and effort well spent as this was a high risk change in a personal space, we would not recommend using this approach for lower risk or smaller changes.

With the addition of Audiobooks since the launch, we are still learning about how users use the Library to meet different needs. We strive to continue our close collaboration across

disciplines to bring further improvements to the Library.

More detail can be found in our paper:
Minimizing change aversion through mixed methods research: a case study of redesigning Spotify’s Your Library
Ingrid Pettersson, Carl Fredriksson, Raha Dadgar, John Richardson, Lisa Shields, Duncan McKenzie
CHI 2023

References

[1] Jennifer Schumi and Janet T Wittes. 2011. Through the looking glass: understanding non-inferiority. Trials 12, 1 (2011), 1–12.
[2] Sara Belt. 2020. Cross-disciplinary Insights Teams: How We Integrate Data Scientists and User Researchers at Spotify. October 2020.

Redesigning Your Library with Mixed Methods Insights

Approach to minimizing change aversion

Early involvement

Evaluating safely at scale

Beta testing

A/B testing

Mixed method studies

Conclusion

References

Related articles

Socially-Motivated Music Recommendation

Exploring Local Music’s Place in Global Streaming

How do people stream newly released music?