Skip to content

notAlex2/reddit_comment_classification

 
 

Repository files navigation

Comparative Analysis of Machine Learning Algorithms for Reddit Comment Classification

Team

  • Harmanpreet Singh (20164950)
  • Alexander Peplowski (20148127)

We examine multiple classification algorithms such as Naive Bayes with smoothing, Linear Support Vector Classifier (SVC), Logistic Regression, Multi-Layer Perceptron (MLP) and sequence models. Investigated feature extraction approaches such as term-frequency inverse document frequency (tf-idf) and self-trained word2vec embeddings.

Score Comparison

Training and test data must be stored at:

  • Train Data: data/data_train.pkl
  • Test Data: data/data_test.pkl

Labelled test data is output as: submission.csv

About

Reddit comment classification Kaggle competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%