Twitter-Hate-Speech-Detection

A repository

Our project analyzed a dataset CSV file from Kaggle containing 31,935 tweets. The dataset was heavily skewed with 93% of tweets or 29,695 tweets containing non-hate labeled Twitter data and 7% or 2,240 tweets containing hate-labeled Twitter data.

The first step of building our model was to balance the number of hate and non-hate tweets.
We clean the tweets by employing lemmatization, stemming, removal of stop words, and omissions.
Then for the preprocessing step, we use Bag of words and Term Frequency Inverse Document Frequency (TFIDF).
For both the Bag of words and TFIDF, we run 5 classification algorithms, namely Logistic Regression, Naive Bayes, Decision Tree, Random Forest and Gradient Boosting.
These 5 algorithms are ran again after performing dimensionality reduction for both TF-IDF and Bag of Words.

The data cleaning.ipynb contains the code for cleaning the tweets. Final working code.ipynb file contains the code for model building and visualisations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data Cleaning.ipynb		Data Cleaning.ipynb
Final working code.ipynb		Final working code.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Cleaning.ipynb

Data Cleaning.ipynb

Final working code.ipynb

Final working code.ipynb

README.md

README.md

Repository files navigation

Twitter-Hate-Speech-Detection

About

Releases

Packages

Languages

vedant-95/Twitter-Hate-Speech-Detection

Folders and files

Latest commit

History

Repository files navigation

Twitter-Hate-Speech-Detection

About

Resources

Stars

Watchers

Forks

Languages