Web scrapping Reddit- Natural Language Processing

**Photo from (http://www.quertime.com/article/15-reddit-user-and-data-analytic-tools/)

Scenario

You're fresh out of your Data Science bootcamp and looking to break through in the world of freelance data journalism. Nate Silver and co. at FiveThirtyEight have agreed to hear your pitch for a story in two weeks!

Your piece is going to be on how to create a Reddit post that will get the most engagement from Reddit users. Because this is FiveThirtyEight, you're going to have to get data and analyze it in order to make a compelling narrative.

Project Summary

In this project, I practiced two major skills. Collecting data by scraping a website and then building a binary predictor.

There are two components to starting a data science problem: the problem statement, and acquiring the data.

For this article, the problem statement will be: What characteristics of a post on Reddit are most predictive of the overall interaction on a thread (as measured by number of comments)?

Methods for acquiring the data will be scraping the 'hot' threads as listed on the Reddit homepage. I will be looking into these features below:

The title of the thread, the subreddit that the thread corresponds to, the length of time it has been up on Reddit, and the number of comments on the thread. Once the data is aquired, I will build a classification model that, using Natural Language Processing and any other relevant features, predict whether or not a given Reddit post will have above or below the median number of comments.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
NLP_Reddit.ipynb		NLP_Reddit.ipynb
README.md		README.md
reddit_scrape		reddit_scrape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

NLP_Reddit.ipynb

NLP_Reddit.ipynb

README.md

README.md

reddit_scrape

reddit_scrape

Repository files navigation

Web scrapping Reddit- Natural Language Processing

Scenario

Project Summary

About

Releases

Packages

Languages

Andrew-Carl/Web-scrapping-Reddit-NLP-

Folders and files

Latest commit

History

Repository files navigation

Web scrapping Reddit- Natural Language Processing

Scenario

Project Summary

About

Topics

Resources

Stars

Watchers

Forks

Languages