Reddit Web Scraping and Subreddit ML Classification

Description

The goal for this project was to develop a classification model using Natural Language data from a publicly available forum data source Reddit. The data was first scraped from the Reddit database using PushShift API, then Exploratory Data Analysis (EDA) was performed, and finally classification models were built related to the chosen subreddits.

File Structure

In the file structure, there exists two folders: data and images. Data contains the output and any intermediary csv's between the various stages of the process. The images folder contains the output images of the Exploratory Data Analysis. 3 main notebooks are:

reddit data scraping
reddit eda (exploratory data analysis)
reddit classification modeling

Conclusions:

Machine learning Classification is a valid way to distinguish corpus text (using NLP). From our dataset and specific modeling used for binary classification in this context, Logistic regression is a better model for the situation at hand for targeted messaging for CS hiring candidates.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
images		images
.DS_Store		.DS_Store
01_reddit_data_scraping.ipynb		01_reddit_data_scraping.ipynb
02_reddit_eda.ipynb		02_reddit_eda.ipynb
03_reddit_modeling.ipynb		03_reddit_modeling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

data

data

images

images

.DS_Store

.DS_Store

01_reddit_data_scraping.ipynb

01_reddit_data_scraping.ipynb

02_reddit_eda.ipynb

02_reddit_eda.ipynb

03_reddit_modeling.ipynb

03_reddit_modeling.ipynb

README.md

README.md

Repository files navigation

Reddit Web Scraping and Subreddit ML Classification

Description

File Structure

Conclusions:

About

Releases

Packages

Contributors 3

Languages

manutej/reddit_scraping_classification

Folders and files

Latest commit

History

Repository files navigation

Reddit Web Scraping and Subreddit ML Classification

Description

File Structure

Conclusions:

About

Topics

Resources

Stars

Watchers

Forks

Languages