Estimating Ground Truth in a Low-labelled Data Regime: A Study of Racism Detection in Spanish

This repository contains the code for the paper submitted to 1st Workshop on Novel Evaluation Approaches for Text Classification Systems on Social Media (NEATCLasS).

Our paper investigates the impact of different ground truth estimation methods on a model for racism detection. The experiments are organised as follows:

The first notebook implements the ground truth estimates (Ground Truth Estimation).
The second notebooks assess the annotation problem due to the few labels and high disagreement (Annotation Assessment).
The last notebook evaluates the performance and error analysis of the models trained on different estimates (Impact Evaluation on Racism Detection).

Most annotators may not have previous experience with racism, as only three belong to the Black community. Our empirical results show better performance at lower thresholds for classifying messages as racist, which may be due to the propagation of permissiveness in annotating racist messages to the model.

Installation

The BcnAnalytics team is working for the release of the data. We will add the link to the data repository here ⌛

Please copy the evaluation_sample.csv and labels_racism.csv to the data folder!

To run our code, pip install the packages in requirements.txt.

Then, you are ready to go for generating all the analysis outputs from our paper using the jupyter notebooks. We describe your resulting project directory tree below 👇

File descriptions

data: Folder containing the data used in this work.
- predictions: Folder with the predictions of all models in all epochs generated in Notebook 3.
- predictions_orig: Folder with the predictions of best epoch of all models (evaluation_sample_m_vote_nonstrict.csv,evaluation_sample_raw.csv,evaluation_sample_w_m_vote_nonstrict.csv)
- toxicity_scores: Evaluation sample with Perspective Toxicity scores and English translation perspective api notebook (evaluation_sample_translated.csv).
- evaluation_sample.csv: Evaluation data sample.
- labels_racism.csv: Raw data.
- labels_racism_aggregated.csv: Training data with the aggregated labels (m_vote and w_m_vote) from notebook 1.
- labels_racism_preproc.csv: Raw data with message ids and numeric labels from notebook 1.
- ids_validation_set.json: List of the id of the samples that belong to the validation set, used in Notebook 3.
model: Folder with performance results of models at different tresholds for predicting racist labels.
- thr_analysis_w_m_vote.csv: F1 scores using different thresholds of the weighted majority vote.
- thr_analysis_w_m_vote.png: Plot of the F1 scores.
models: Folder with trained models at each epoch.
src: Folder with other functions and notebooks used in this work.
- perspective_api.ipynb: Google API jupyter notebook for getting the toxicity of the messages.
- huggingface.py: Python script with functions to load, train, and evaluate the models using HuggingFace transfomer library.
- utils.py: Python script with utility functions for loading the dataset or binarizing the labels.
plots: Folder with plots from data exploration in notebook 2.
- agreement_annotators_nonstrict.png: Strict agreement plot.
- agreement_annotators_strict.png: Non-strict agreement plot.
1_annotations_aggregation.ipynb: First notebook to be executed. This notebook allows us to get the aggregated labels and save them in the data folder.
2_annotation_assessment.ipynb: Second notebook to be executed. This notebook analyses the previous data for getting the agreement between annotators.
3_racism_detection_model.ipynb: Third notebook to be executed. This last notebook analyses the threshold importance as well as the performance evaluation.

Authors

Do not hesitate to contact us with any ideas or any reproducibility problems!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

model

model

plots

plots

src

src

1_annotations_aggregation.ipynb

1_annotations_aggregation.ipynb

2_annotation_assessment.ipynb

2_annotation_assessment.ipynb

3_racism_detection_model.ipynb

3_racism_detection_model.ipynb

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Estimating Ground Truth in a Low-labelled Data Regime: A Study of Racism Detection in Spanish

Installation

File descriptions

Authors

About

Releases 1

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
model		model
plots		plots
src		src
1_annotations_aggregation.ipynb		1_annotations_aggregation.ipynb
2_annotation_assessment.ipynb		2_annotation_assessment.ipynb
3_racism_detection_model.ipynb		3_racism_detection_model.ipynb
README.md		README.md
requirements.txt		requirements.txt

preyero/neatclass22

Folders and files

Latest commit

History

Repository files navigation

Estimating Ground Truth in a Low-labelled Data Regime: A Study of Racism Detection in Spanish

Installation

File descriptions

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages