Nuanced Arabic Dialect Identification Shared Task Series (NADI)

This repository lists information relevant to the Nuanced Arabic Dialect Identification Shared Task Series (NADI).

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

We present the findings and results of the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1.1), country-level dialect identification (Subtask 1.2), province-level MSA identification (Subtask 2.1), and province-level sub-dialect identification (Subtask 2.2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. A total of 53 teams from 23 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 16 submissions for Subtask 1.1 from five teams, 27 submissions for Subtask 1.2 from eight teams, 12 submissions for Subtask 2.1 from four teams, and 13 Submissions for subtask 2.2 from four teams.

Offical website

Access the offical website via this link.

Download the data

To download the data you need to fill the registration form. link

Sub-tasks

(1) Subtask 1 (Country Level)

Subtask 1.1: Country-level MSA identification: A total of 21,000 tweets, covering 21 Arab countries. CODALAB link
Subtask 1.2: Country-level DA identification: A total of 21,000 tweets, covering 21 Arab countries. CODALAB link

(2) Subtask 2 (Province level)

Subtask 2.1: Province-level MSA identification: A total of 21,000 tweets, covering 100 provinces. CODALAB link
Subtask 2.2: Province-level DA identification: A total of 21,000 tweets, covering 100 provinces. CODALAB link

Please cite NADI 2021 as follows:

@inproceedings{mageed:2021:nadi,
    author = {Abdul-Mageed, Muhammad and Zhang, Chiyu and Elmadany, AbdelRahim and Bouamor, Houda and Habash, Nizar}, 
    title = {{NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task}},
    booktitle ={Proceedings of the Sixth {A}rabic Natural Language Processing Workshop (WANLP 2021)},
    year = {2021},
}

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

Offical website

Access the offical website via this link.

Download the data

To download the data you need to fill the registration form. link

Sub-tasks

Subtask 1 Country-level dialect identification: A total of 21,000 tweets, covering all 21 Arab countries. This is a new dataset created for this shared task. CODALAB link
(2) Subtask 2 Province-level dialect identification. A total of 21,000 tweets, covering 100 provinces from all 21 Arab countries. This is the same dataset as in Subtask 1, but with province labels. CODALAB link

Please cite NADI 2020 as follows:


@inproceedings{mageed:2020:nadi,
  title={{NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task}},
  author={Abdul-Mageed, Muhammad and Zhang, Chiyu and Bouamor, Houda and Habash, Nizar},
  booktitle={Proceedings of the Fifth Arabic Natural Language Processing Workshop},
  pages={97--110},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
NADI2020.png		NADI2020.png
NADI2021.png		NADI2021.png
NADI2023.ipynb		NADI2023.ipynb
NADI2023_MT_examples.tsv		NADI2023_MT_examples.tsv
NADI2023_subtast1_sample.tsv		NADI2023_subtast1_sample.tsv
README.md		README.md
run_NADI2023_MT.py		run_NADI2023_MT.py
run_NADI2023_country_level.py		run_NADI2023_country_level.py
run_seq2seq.py		run_seq2seq.py
subtast1_sample.tsv		subtast1_sample.tsv

UBC-NLP/nadi

Folders and files

Latest commit

History

Repository files navigation

Nuanced Arabic Dialect Identification Shared Task Series (NADI)

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

Offical website

Access the offical website via this link.

Download the data

Sub-tasks

Please cite NADI 2021 as follows:

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Offical website

Access the offical website via this link.

Download the data

Sub-tasks

Please cite NADI 2020 as follows:

About

Topics

Resources

Stars

Watchers

Forks

Languages