Ensemble Learning for Tweet Classification of Hate Speech and Offensive Language - Winter/Spring Project 2018

These programs are part of a project that will use an ensemble learning model to detect offensive language and hate speech in tweets. It is composed of:

A Voting classifier
An LSTM network
A Bayesian model
A Proximity model

Link to full project: https://github.com/quinnbp/WT2018

This repository contains:

Voting classifier for hate-speech and offensive language detection in tweets:

Uses a hard-voting (majority voting) classifier that evaluates the outputs of:
- An SGD Classifier with log loss
- A LinearSVM Classifier with L1 feature selection and L2 classification
- A Perceptron
Features:
- TFIDF matrix
- POS-Tags matrix
- Sentiment analysis
- Prescence-of-lexicon-terms score
- Word embeddings (random and GloVe)

TODO:

Try improving word embeddings using a neural network based on [2]

Weighting system for ensemble learning

Has 3 different options for applying weighted voting:
- Precision score of the classifiers' confusion matrices
- CEN score
- Precision + CEN score
- Equal voting

Confusion matrix class

Creates a confusion matrix given the output predictions of a classifer and the set of true labels
Contains operations like getting precision score, storing it as a pdf, getting number of false positives, getting the CEN score of the matrix, etc.

All written by Daniel Firebanks

Inspired by the research of:

[1]Davidson et al. (https://github.com/t-davidson/hate-speech-and-offensive-language)
[2]Badjatiya et al. (https://github.com/pinkeshbadjatiya/twitter-hatespeech)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
All_Tweets_June2016_Dataset.csv		All_Tweets_June2016_Dataset.csv
CEN.py		CEN.py
README.md		README.md
confusion_matrix.py		confusion_matrix.py
instance.py		instance.py
labeled_data.csv		labeled_data.csv
ngram_dict.csv		ngram_dict.csv
project_main.py		project_main.py
random_model.txt		random_model.txt
random_model_combined_tweets.txt		random_model_combined_tweets.txt
tests.py		tests.py
voting_classifier.py		voting_classifier.py
weighting.py		weighting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All_Tweets_June2016_Dataset.csv

All_Tweets_June2016_Dataset.csv

CEN.py

CEN.py

README.md

README.md

confusion_matrix.py

confusion_matrix.py

instance.py

instance.py

labeled_data.csv

labeled_data.csv

ngram_dict.csv

ngram_dict.csv

project_main.py

project_main.py

random_model.txt

random_model.txt

random_model_combined_tweets.txt

random_model_combined_tweets.txt

tests.py

tests.py

voting_classifier.py

voting_classifier.py

weighting.py

weighting.py

Repository files navigation

Ensemble Learning for Tweet Classification of Hate Speech and Offensive Language - Winter/Spring Project 2018

Voting classifier for hate-speech and offensive language detection in tweets:

Weighting system for ensemble learning

Confusion matrix class

About

Releases

Packages

Languages

thefirebanks/Ensemble-Learning-for-Tweet-Classification-of-Hate-Speech-and-Offensive-Language

Folders and files

Latest commit

History

Repository files navigation

Ensemble Learning for Tweet Classification of Hate Speech and Offensive Language - Winter/Spring Project 2018

Voting classifier for hate-speech and offensive language detection in tweets:

Weighting system for ensemble learning

Confusion matrix class

About

Topics

Resources

Stars

Watchers

Forks

Languages