Re-contextualizing Fairness in NLP: The Case of India

This repository contains data resources for the paper "Re-contextualizing Fairness in NLP: The Case of India" accepted as to AACL-IJCNLP 2022.

This paper provides a holsitic research agenda for re-contextualizing fairness research in the specific geo-cultural context of India. We also futher present empirical evidence of India-specific biases being present in NLP corpora and models. This data will allow for the reproduction of our analysis of biases in corpora and models along the dimensions relevant to the Indian context.

The dataset contains tuples of the form (identity term, attribute) (for eg: (gujarati, entrepreneur)). These tuples are then annotated by human-raters for whether the attribute is commonly associated with the identity term as a stereotype. The tuples were created with a combination of dictionary driven (relying on previous literature for list of characteristics and identity terms) and corpora driven (filtering based on occurrence in IndicCorp-en) approaches. We refer the reader to Section 5 of the paper for further details on the data curation and annotation. We also retain individal annotations with anonymized annotator ids and self-identified gender and geographic region following Prabhakaran et al., 2021. Along with the annotated tuples, we also release the list of identity terms and proxy identity terms (first names with prototypical gender associations as obtained from Wikipedia) and list of templates used to perform the analysis of NLP models in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
caste_idterms.tsv		caste_idterms.tsv
datacard.pdf		datacard.pdf
gender_idterms.tsv		gender_idterms.tsv
gender_proxy_idterms.tsv		gender_proxy_idterms.tsv
region_annotations.tsv		region_annotations.tsv
region_idterms.tsv		region_idterms.tsv
region_individual_annotation.tsv		region_individual_annotation.tsv
religion_annotations.tsv		religion_annotations.tsv
religion_idterms.tsv		religion_idterms.tsv
religion_individual_annotation.tsv		religion_individual_annotation.tsv
templates.tsv		templates.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

caste_idterms.tsv

caste_idterms.tsv

datacard.pdf

datacard.pdf

gender_idterms.tsv

gender_idterms.tsv

gender_proxy_idterms.tsv

gender_proxy_idterms.tsv

region_annotations.tsv

region_annotations.tsv

region_idterms.tsv

region_idterms.tsv

region_individual_annotation.tsv

region_individual_annotation.tsv

religion_annotations.tsv

religion_annotations.tsv

religion_idterms.tsv

religion_idterms.tsv

religion_individual_annotation.tsv

religion_individual_annotation.tsv

templates.tsv

templates.tsv

Repository files navigation

Re-contextualizing Fairness in NLP: The Case of India

About

Releases

Packages

License

google-research-datasets/nlp-fairness-for-india

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks