Skip to content

Gaurav-Van/Toxic-Comment-Web_App

Repository files navigation

Toxic-Comment-App


Note: This Repository is required for deployment of this project on Streamlit Cloud.


Web App Link :- https://gaurav-van-toxic-comment-web-app-app-24y37c.streamlitapp.com/
Project Repo: https://github.com/Gaurav-Van/Data_Science__Machine_Learning-Projects

Classifying Comments in Six different Categories including their Neutral Cases Using Concepts of NLP and ML

  • Toxic
  • Severe Toxic
  • Threat
  • Obscene
  • Insult
  • Identity Hate

Concept Used

Instead of Multiclass classification, Binary Classification of Each Category is performed

1. Data Collection - From Kaggle: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

2. Data Pre-Procesing - Text Pre-Processing Using Regular Expressions

  • Removing \n characters
  • Removing Aplha-Numeric Characters
  • Removing Punctuations
  • Removing Non Ascii Characters

3. EDA - Performaing Data analysis to Discover some Issues and trend of the Data

  • Through Bar charts of Each Category :- Prob = Class Imbalance -> Solution = Making Frequency of 0s equal to Frequency of 1s by Making Different Dataset of each Category [ id, comment_text, category].
  • Helps to solve the Issue of Class Imbalance and Helps in Binary Classification of Each Category

4. Model Building

  • VECTORIZATION :- Using TF-IDF and Unigram Approach
  • Model Used For Each Category :- KNN, Logistic Regression, SVM, CNB, BNB, DT and RF
  • Model Selected/b> - Logistic Regression
  • Exporting Trained ML Models as 6 pickle files [ one of each category ]
  • Exporting Trained Vectorized Models as 6 pickle files [ one for each category ]

5. Deployment - Building web app with the help of streamlit and deploying it on Streamlit cloud


About

Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages