Skip to content

Bagrisham/academic-reddit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

View my homepage for my project online here: https://www.bagrisham.com/the-reddit-project/


This is a backup of my research. I needed to store it somewhere. It serves as a historic record of US-based Reddit traffic. Please do not use this information for evil. There is a surprising level of power contained in these findings.

The program "4pm-reddit.ipynb" is the Python3 script that is primarily used for data collection. It utilizes PRAW (Reddit API),Node.JS,PANDAS, Pushshift, and outputs to csv files for Excel use.


My work was featured in Mississippi State University’s Spring 2020 Undergraduate Research Symposium. Details can be found at page 10, 16, and 58.

Link to research symposium recognition: https://www.honors.msstate.edu/sites/www.honors.msstate.edu/files/Abstract%20Booklet_Front%20Half_Spring%202020%28al%29_0.pdf

Abstract:

Brandon Grisham Analyzing and Predicting Trends with Public Data from Online Communities

Reddit has become one of the largest social news forums in the world, featuring an average of 430 million active users and 21 billion views per month (RedditInc Press, 2019). A key question is how businesses and academia can effectively use the public information provided by Reddit’s organized communities to explore salient interesting research questions. Given the scope of Reddit’s 130 thousand ‘subreddits’, difficulties arise from the lack of collection tools that can analyze user opinions, posted questions, or shared perspectives. This research utilizes a personally-created program that collects daily traffic data from the entire website, scraping various parameters like subscriber growth and post frequency into a keyword-driven database. The API-programmed script also populates our database with details regarding specific communities, highlighting areas of interest for businesses and academia alike. By utilizing this database system, one can locate and chart growing or diminishing trends among Internet users. An example of one of these trends include correlating data resulting from the Coronavirus Disease (COVID-19). Clear patterns emerge between the subscriber growth of the Coronavirus subreddit, a rise in the posts regarding the Coronavirus on numerous investment subreddits, and the impact the virus has made on the U.S. stock market. The ultimate goal of this research is to leverage statistical data analysis and machine-learning techniques to capitalize on this dataset. This research will help illuminate Internet user interests, predict rising and falling trends, and chart the current popularity of cultural phenomena.

About

A historic collection of Reddit traffic data for my academic study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published