Representation of music creators on Wikipedia

October 28, 2021 Published by Alice Wang, Aasish Pappu, Henriette Cramer

Wikipedia is the largest online encyclopedia, and English Wikipedia is one of the most visited websites worldwide.  With its number one subject focused on “art and culture”, it includes some of the most comprehensive repositories of music knowledge in the world. Being on Wikipedia may influence a creator’s potential popularity as well as provide a way for listeners to discover and gather knowledge about creators. Here, we decided to dive deeper into the subject, analysing how the 50,000 most streamed Spotify artists are represented on Wikipedia.

Key insights:

  • Artist popularity correlates with probability of being on Wikipedia
  • Women are overall underrepresented in popular music, but are more likely to be on Wikipedia after they have made it to the top
  • Chances of being on Wikipedia varies dramatically across music genres

Methodology

For every Spotify artist in the top 50k, we use an in-house entity linker that links Spotify artists to their English Wikipedia articles. This entity linker performs with an estimated 95% accuracy based on expert annotator checks of an artist sample.  After entity linking, we assign a binary label (0 or 1) to each artist, which represents presence or absence on English Wikipedia.  We then plot moving averages of this binary label across artist popularity rank with a window size of 1000 artists to compute the fraction of artists on Wikipedia.  In the following, we present some insights we obtained.

Streaming popularity correlates with Wikipedia representation

Overall, 42% of the top 50k Spotify artists have content available about them on Wikipedia, with streaming popularity strongly correlating with presence on Wikipedia.  The relationship between streaming popularity and Wikipedia representation, however, is not linear.   For the top 1000 most streamed artists, an artist’s chance of being on Wikipedia is 90%. That being said, the chance of being on Wikipedia drops to below 50% after the 10,000th artist.  For the bottom third of artists studied here, representation drops to below 30%.

Next, we show that an increase in streams by an order of magnitude corresponds to a 20-30% increase in an artist’s chance of being on Wikipedia.  Note that the figure below is in Log scale.

Female solo/group artists are slightly more represented than are male solo/group artists

After showing that streaming popularity is correlated with Wikipedia representation, we examine how gender distribution across artists is represented. We divide artists into those that are labeled as all-female solo/group artists, all-male solo/group artists, or multi-gender groups by a third party company. We note that due to data limitations, this analysis is not inclusive to non-binary artists.  Controlling for popularity, we find that all top female solo/group artists are slightly but significantly more represented than are all male solo/group artists.  However, it is important to note that women are still overall underrepresented in music (see Epps et al, 2020), including in this top 50k. However, women who do make it to the top 50k are well represented on Wikipedia.

Artists of different genres have varying chances of appearing on Wikipedia

We next examine how artists of different genres are represented on Wikipedia. The most popular genres are pop, hip hop, dance/electronic, rock, indie, and Latin, respectively. These genres account for 80% of artists in our data. Interestingly, we find that rock artists are the best represented while Latin and hip hop artists are the least represented. Overall, rock artists have 2.5 times the representation of hip hop artists and 3 times the representation of Latin artists. This effect is even more pronounced when considering only those artists in the top level (those in the top third of the 50k). Within this group, 85% of rock artists are on Wikipedia while only 33% of dance/electronic, 28% of hip hop, and 21% of Latin artists are represented.

One explanation for Latin artists’ low representation is that we restrict analyses to English Wikipedia. However, language differences alone cannot explain why hip hop and dance/electronic artists are less well represented on English Wikipedia.

Diving deeper into artist Wikipedia content

After determining which artists have articles on Wikipedia, we examine the contents of such pages.  We observe that rock artists generally have the most words and images, followed by pop and indie, while Latin and dance/electronic artists consistently have the least amount of content.  Despite having lower chances of being on Wikipedia, hip hop artists have moderate to high amount of content. 

We then examine the Pagerank and number of incoming links for each artist’s page. We find that rock artists have the most inlinks and highest PageRank.  Interestingly, hip hop and pop artists have the next highest inlinks and PageRank.  Indie, dance/electronic, and Latin, on the other hand, are least well connected. 

Last, we quantify community engagement by computing the average number of weekly edits over a six month period between 2019 and 2020. Surprisingly, hip hop artists have the highest numbers of community edits, which is 1.5 times higher than that of the second highest genre, which is rock. Thus, there is a high level of editorial engagement with hip hop artists who have Wikipedia pages.

What becomes clear from the data is that hip hop artists have significantly lower chances of being on Wikipedia. However, existing Wikipedia pages of these artists have high numbers of content, community edits, PageRank, and inlinks. Rock artists are the most extensively represented on Wikipedia in all aspects. In contrast, Latin and dance/electronic artists have persistently low levels of representation on all dimensions and for all popularity levels.

Summary

This study allows us to analyze Wikipedia coverage through accurately matching artists to their Wikipedia pages on a large scale, with high precision and recall across a range of artist  popularity.  We observe a clear correlation between streaming popularity and Wikipedia representation, which suggests that Wikipedia content largely reflects global musical taste and may even have potential to influence music consumption. Future work will be required to tease apart causality between Wikipedia representation and streaming patterns.  

Further, we find an under-representation of male solo/group artists relative to female solo/group artists of the same popularity levels.  With regard to genre, there are dramatic differences in representation. Rock artists have the best coverage, while representation of hip hop, Latin, and dance/electronic artists is particularly lacking.  Interestingly, there appears to be no lack of editorial community enthusiasm for hip hop artists who already have pages. Rather, they are lacking in having articles created for them in the first place. 

The gaps in representation that we observe may have significant and widespread impact.  An artist’s visibility on platforms such as Wikipedia could both influence and be influenced by their streaming popularity. This presence can affect the way in which information is, or is not, offered to audiences, which ultimately can affect attention, streams, and longer-term audience building. These differences and the wider ecosystem are worthy of further investigation. 

More can be found in our paper:
Representation of Music Creators on Spotify, Differences in Gender and Genre
Alice Wang, Aasish Pappu, Henriette Cramer
ICWSM 2021.