An Efficient BERT Aided Pipeline to Detect Aggression and Misogyny

Try it Out!

About

Social media is bustling with ever growing cases of trolling, aggression and hate. A huge amount of data is generated each day which is insurmountable for manual inspection.

In this work, we propose an efficient and fast pipeline to detect aggression and misogyny in social media texts. We use data from the Second Workshop on Trolling, Aggression and Cyber Bullying for our task.

We employ a BERT based pipeline to augment our data. Next we employ Tf-Idf and XGBoost based pipeline for detecting aggression and misogyny.

Our model achieves 0.73 and 0.85 (both Weighted F1 Score) on the 2 prediction tasks, which ranks very close to the state of the art.

However, the training time, model size and resource requirements are drastically reduced compared to state of the art models, making our proposed pipeline useful for fast inference. We describe the pipeline, examine the results and conduct error analysis to understand the shortcomings of our model.

Paper Link

ACL Anthology Link

Model Pipeline

Training and Inference

Create a virtual environment. See here
Clone the repository See here
Navigate to the cloned repository
Install requirements as pip install -r requirements.txt
Navigate to /core directory and set it as your current working directory
run bash ./run.sh for train, validation and inference

Results (Weighted F1 Score)

Team Name(Cited in paper)	Score Sub Task A	Score Sub Task B
Julian	0.802	0.851
abaruah	0.728	0.870
sdhanshu	0.759	0.857
Our Model	0.735	0.852

Analysis

Task A (Aggression Detection) Confusion Matrix

Classes are (left to right and top to bottom)

OAG (Overtly Aggressive): Explicitly Aggressive Terms
CAG (Covertly Aggressive): Covertly Aggressive Terms like sarcasm
NAG (Non Aggressive): Non Aggressive texts

Task B (Misogyny Detection) Confusion Matrix

Classes are (left to right and top to bottom)

NGEN: Neutral Texts
GEN: Contains misogynistic connotations

Repository Details

assets - Images for report
core - Code related to training and testing after augmentation
input - Data Input. Contains train, test and gold data
models - Serialized Model Files
notebooks - Notebooks done in Google Colab. notebooks/Data_Augmentation_Aggression_Detection.ipynb contains detailed code regarding the augmentation process
reports - .tex files
test_results - Test CSV file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

core

core

input

input

models

models

notebooks

notebooks

reports

reports

test_results

test_results

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

An Efficient BERT Aided Pipeline to Detect Aggression and Misogyny

Try it Out!

About

Paper Link

Model Pipeline

Training and Inference

Results (Weighted F1 Score)

Analysis

Repository Details

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
core		core
input		input
models		models
notebooks		notebooks
reports		reports
test_results		test_results
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

Dutta-SD/AggDetect

Folders and files

Latest commit

History

Repository files navigation

An Efficient BERT Aided Pipeline to Detect Aggression and Misogyny

Try it Out!

About

Paper Link

Model Pipeline

Training and Inference

Results (Weighted F1 Score)

Analysis

Repository Details

About

Topics

Resources

License

Stars

Watchers

Forks

Languages