Skip to content

Reddit Comment Karma Prediction and Analysis via Supervised and Unsupervised techniques

Notifications You must be signed in to change notification settings

100/Reddit-Karma-Prediction

Repository files navigation

#Predicting and Analyzing Reddit Comment Karma

##SEE IT LIVE HERE!

###Uses various data-science and NLP methods to analyze comment karma throughout Reddit and provide this information with visualizations and a RESTful API.

##Dependencies:

  • Flask (0.9)
  • Flask-Limiter (0.9.1)
  • WTForms (2.1)
  • gunicorn (0.17.2)
  • textblob (0.11.0)
  • numpy (1.10.1)
  • scikit-learn (0.17)
  • Python 2.7.11
  • [Optional] Pickle files present in pickles folder

##Implements:

  • K-means unsupervised clustering
  • Linear SVM supervised classification
  • RESTful API
    • Documented
    • Rate-limited
  • Extensive JSON parsing
  • Various facets of Natural Language Processing
  • Graph visualization
    • JS (cytoscape.js)
  • Various facets of web-design
    • HTML
    • CSS (Bootstrap)

Inspired by the work done here and here, this application utilizes two classifiers and k-means clustering to provide insights on comment karma on Reddit. The first classifier used n-grams to classify comments as either positive or negative, and the second, which uses the classification from the first as one of its features, classifies comments into one of five bins based on score ranges based on various metadata and calculated NLP-related features. The application also, independent from the aforementioned tasks, parsed the aggregate data to cluster comments by average karma per comment, and created a clustered graph visualization of the top subreddits in the data-set. Lastly, the application implemented a fully-documented and rate-limited RESTful API to allow developers to use the prediction service in their own applications.

About

Reddit Comment Karma Prediction and Analysis via Supervised and Unsupervised techniques

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published