Consumption-based approaches in proactive detection for content moderation
Shahar Elisha, John N. Pougué-Biyong, Mariano Beguerisse-Díaz
Wikipedia is not only the world’s largest online encyclopedia and among the most frequented websites, but provides important data leveraged by many popular services and products. Since Wikipedia data is ubiquitously encountered, it is important to evaluate its coverage of content and identify data gaps that may exist. Here, we evaluate Wikipedia’s coverage of the music domain, which is one of the most popular topics.Particularly, we compile the most prominent 50,000 music artists (by streaming popularity on a large online streaming platform) and determine whether each artist has a Wikipedia page. We first show that streaming popularity correlates with Wikipedia representation – while 90% of the top one thousand most popularly streamed artists are on Wikipedia, the chance of being on Wikipedia drops to 50% after the ten thousandth artist. Next, we examine the Wikipedia coverage of artists of different gender and genre, while controlling for popularity.We also examine, for artists that are on Wikipedia, the amount of content, frequency of edits, and Pagerank for their pages.We uncover large differences in representation for artists of different genres; for the same popularity level ,hiphop, latin, and dance/electronic artists are most lacking in representation while rock artists have approximately twice as much representation. With respect to gender, while female artists are under represented in the top of the music industry itself, male artists were less likely represented on Wikipedia relative to the female artists in this study’s top sample, suggesting inter-action with genre and visibility of select superstars.
Shahar Elisha, John N. Pougué-Biyong, Mariano Beguerisse-Díaz
Amar Ashar, Karim Ginena, Maria Cipollone, Renata Barreto, Henriette Cramer