Skip to content

joshivaibhav/RedditDiscourse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Discourse in Reddit

Our goal here is to build a machine learning agent that can predict which subreddit an unlabeled post comes from. We aim to acheive this via implementing various machine learning algorithms and techniques such as Classification (Binary and Multiclass) and Deep Learning. We then compare and contrast the various algorithms and deduce which is the best for the job. We document our findings in our final report.

Data

Each data record is naturally a reddit post. We choose any two subreddits as our target classes for the posts. We split the data records at around 1000 for each target class.

The posts are fetched using Reddit's PRAW API.

Classifiers

  • Naive Bayes
  • SVM
  • Random Forest
  • Logistic

Neural Networks

  • LSTM (Long Short-Term Memory)

Report and Analysis

  • We compare accuracies and generate ROC Curves and Learning curves for a deeper analysis
  • Feel free to dive into the Report pdf for more details.

About

Pitting Neural Networks against a gamut of classifiers such as Naive Bayes, Decision Trees, SVM in classifying posts into subreddits.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages