User flairs on r/soccer
IN PROGRESS
I am doing some data mining, exploratory analysis, and data visualization of user flairs on the subreddit, r/soccer. Flairs are little badges a user can have beside their username to show which team they support.
Flairs are a way of communicating to other Reddit users where you stand, giving context to your comment. It removes the need for each individual to type "I support x team, btw" or "As a x fan, ..."
Using PRAW, The Python Reddit API Wrapper, I extracted information about posts and comments and answered questions I was curious about.
1% rule of internet culture: Only about 1 percent of users on a website create new content. For example, there are 1.8 million users subscribed to r/soccer, but an average post only has 500 upvotes and top posts of the month have 35000 upvotes. The results of this analysis apply only to commenters on the subreddit.
-
General exploration
- I sampled a random 5 percent of the top 5000 posts for the month on r/soccer. For each post, I recorded all the comments and their users flairs and found that 28 percent of users have not chosen a flair. That is, about 7 out of 10 individuals have chosen a team or country to represent them. The data is available in a csv file here.
- Which flairs get the most upvotes?
- Which flairs are the most plentiful?
-
Hypothesis testing questions
- Is there a cyclic pattern to flair frequency? Can we see teams grow and lose popularity?
- Is there a 'siloing' effect where only similar users comment on a post?
- Producing useful data visualizations and good writing, instead of just jupyter notebooks
- Using SQL to manage database