Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 787 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 787 Bytes

Comparative Analysis of Machine Learning Algorithms for Reddit Comment Classification

Team

  • Harmanpreet Singh (20164950)
  • Alexander Peplowski (20148127)

We examine multiple classification algorithms such as Naive Bayes with smoothing, Linear Support Vector Classifier (SVC), Logistic Regression, Multi-Layer Perceptron (MLP) and sequence models. Investigated feature extraction approaches such as term-frequency inverse document frequency (tf-idf) and self-trained word2vec embeddings.

Score Comparison

Training and test data must be stored at:

  • Train Data: data/data_train.pkl
  • Test Data: data/data_test.pkl

Labelled test data is output as: submission.csv