Cyberbullying Detection in Bangla

This repository is dedicated to advancing the field of cyberbullying detection in the Bangla language through the development and thorough evaluation of machine learning and deep learning models. Cyberbullying poses a significant challenge in the digital landscape, and this project aims to provide effective tools for identifying and addressing such harmful online behavior.

Methodology

A. Preprocessing

The initial step involves preparing the data for training and evaluation. Techniques such as text tokenization, stemming or lemmatization, and elimination of stop words and unnecessary letters are applied. The processed data is then split into training and testing sets.

B. Feature Representation

Different algorithms require distinct feature representation methods. Classic algorithms like RF, SVM, DT, and LR use a bag-of-words or TF-IDF representation. Deep learning models like BERT, RNN, ANN, CNN, and BiLSTM benefit from word embeddings or contextual embeddings for capturing semantic connections within the text.

C. Models

Recurrent Neural Network (RNN): Collects sequential information, overcoming gradient difficulties with versions like LSTM and BiLSTM.
Artificial Neural Network (ANN): The core of deep learning, adaptable to various tasks.
Convolutional Neural Network (CNN): Commonly used for image recognition, showing good results in text classification.
Support Vector Machine (SVM): A robust classification technique seeking hyperplanes for distinctive categorization.
Logistic Regression (LR): A linear model used for binary or multiclass classification.

D. Evaluation

The performance of each algorithm is assessed using metrics such as accuracy, precision, recall, and F1-Score.

Bangla Dataset

The Bangla dataset categorizes cyberbullying into four groups: "Troll," "Sexual," "Religious," and "Threat." The "Not Bully" category indicates material not fitting into any cyberbullying criteria. This dataset enables focused study and model development for multiple forms of cyberbullying in the Bangla language.

Classification Report for Bangla Dataset

Algorithm	Accuracy	Precision	Recall	F1-Score
RNN	0.75	0.75	0.75	0.74
ANN	0.64	0.63	0.64	0.63
CNN	0.63	0.63	0.63	0.62
SVM	0.57	0.56	0.57	0.55
LR	0.56	0.55	0.56	0.54

Streamlit GUI

Link to Cyberbullying Detection App (Bangla)

Usage:

Enter Bangla text for cyberbullying detection.
The app provides predictions, cyberbullying type, bad words, and filtered text.

Sample Texts

ভেবেছিলাম তুই একটা ছেলে!!! এখন দেখি এটা একটা হিজরা?
প্রতিটি নাটক কয়েকবার করে দেখা হয়ে গেছে

Link to the Application

Link to the Bangla Cyberbullying Detection App

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.gitattributes		.gitattributes
CyberBullyingBangla.py		CyberBullyingBangla.py
LICENSE		LICENSE
README.md		README.md
cyberbullying_model.h5		cyberbullying_model.h5
label_encoder.pkl		label_encoder.pkl
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.devcontainer

.devcontainer

.github/workflows

.github/workflows

.gitattributes

.gitattributes

CyberBullyingBangla.py

CyberBullyingBangla.py

LICENSE

LICENSE

README.md

README.md

cyberbullying_model.h5

cyberbullying_model.h5

label_encoder.pkl

label_encoder.pkl

requirements.txt

requirements.txt

tokenizer.pkl

tokenizer.pkl

Repository files navigation

Cyberbullying Detection in Bangla

Methodology

A. Preprocessing

B. Feature Representation

C. Models

D. Evaluation

Bangla Dataset

Classification Report for Bangla Dataset

Streamlit GUI

Usage:

Sample Texts

Link to the Application

About

Releases

Packages

Languages

License

amiruzzaman1/Cyberbullying-Detection-Bangla

Folders and files

Latest commit

History

Repository files navigation

Cyberbullying Detection in Bangla

Methodology

A. Preprocessing

B. Feature Representation

C. Models

D. Evaluation

Bangla Dataset

Classification Report for Bangla Dataset

Streamlit GUI

Usage:

Sample Texts

Link to the Application

About

Topics

Resources

License

Stars

Watchers

Forks

Languages