D3code

This repository contains data resources for the following paper:

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates (accepted at FAccT 2024).
D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation (under review).

D3code is a large-scale cross-cultural dataset of parallel annotations for offensive language detection by over 4k annotators, balanced across gender and age, from across 21 countries, representing eight geo-cultural regions.

Dataset Description

The repo contains the data card for the D3code dataset, following the format proposed by Pushkarna et al.. The data card includes details of the dataset such as intended usage, field names and meanings, annotator recruitment and payments. The dataset folder contains the following 3 files:

raters.csv: each row represents a participant of the study, along with their unique anonomous id, age, gender, self-reported socio-economic statues, country and region of residence.
items.csv: each row represents an item, selected from the Jigsaw dataset, along with their textual content, their category (one of moral, random, or social group) and their subcategroy. These two fields point to the strategy used for collecting the item.
ratings.csv: each row represents a rating assigned to an item by a rater. Two columns show the (1) raw rating: a value from 0 to 4, 0 being not offensive at all and 5 being extremely offensive, a value of -1 in this column means that the rater did not understand the message, and (2) binary rating: a binary value, with 0 showing a raw rating of 0 or 1, and 1 showing a raw rating of 2 or higher. An na is equal to a raw rating of -1.

Citation

@article{davani2024d3code,
title={D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation},
author={Davani, Aida Mostafazadeh and D{'\i}az, Mark and Baker, Dylan and Prabhakaran, Vinodkumar},
journal={arXiv preprint arXiv:2404.10857},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
D3 Data Card.pdf		D3 Data Card.pdf
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

D3 Data Card.pdf

D3 Data Card.pdf

LICENSE.txt

LICENSE.txt

README.md

README.md

Repository files navigation

D3code

Dataset Description

Citation

About

Releases

Packages

License

google-research-datasets/D3code

Folders and files

Latest commit

History

Repository files navigation

D3code

Dataset Description

Citation

About

Resources

License

Stars

Watchers

Forks