Skip to content

abdelrahman-wael/Arabic-Dialect-Classification-Nadi-Shared-Task

Repository files navigation

NADI Shared Task 2020 for Arabic dialect classification site

Arabic has a widely varying collection of dialects. Many of these dialects remain under-studied due to the rarity of resources. The goal of the shared task is to alleviate this bottleneck in the context of fine-grained Arabic dialect identification. Dialect identification is the task of classifying the dialect of the tweet writer given the tweet itself.

We present our model for Arabic dialect classification that ranked fourth in WANLP 2020 leaderboard

By running the train.py file you are able to start the training process.

There are multiple params that can be changed in config_train.txt, a detailed explanation will be provided later on

Summary

Using pre-trained AraBert, we first proceeded by fine-tuning the model applying masked language modeling on Arabic tweets as shown in the image below. This is also known as domain adaptation.
then we added a classification layer and retrained our fine-tuned model to distinguish different Arabic dialects.

credit

hugging face repo

farasa seg rwdepo repo

About

using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages