Names transliteration

In this repository you will find:

a dataset (and associated code to build it) containing names in arabic characters and associated names in latin characters (english),
a (google colab) notebook to train a Neural Machine Translation (NMT) model based on seq2seq. The objective of this model is to transliterate names in arabic alphabet to latin alphabet. This task is also called romanization.

The model is trained thanks to Google Colab providing (free) GPU.

The model is based on Tensorflow tutorial NMT with attention.

Data

We use 3 datasets:

Google transliteration data. Example: عادل; adel
ANETAC dataset. Example: PERSON; Adel; اديل. For this file we'll filter on PERSON only,
NETranliteration COLING 2018.

These 3 datasets will give us a clean dataset containing names in arabic and corresponding names in latin alphabet (english).

Pre-trained models

A pre-trained model (arabic to latin characters) is stored on dropbox.

Colab notebook

A jupyter notebook is provided to train the model used for transliteration.

Web application - Streamlit

A streamlit is provided. You can find a deployed version here.

Library

Install library:

python setup.py install

CLI

get-data: Get data from 3 sources to get a training dataset.
get-pretrained-model: Download pre-trained model for the task.
train-nmt-model: Train an NMT model.
transliterate-name: Transliterate a name in arabic in latin character.

Python environment

Please refer to the environment.yml file for conda environment.

To create the environment with conda:

conda env create -f environment.yml

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
data		data
names_transliteration		names_transliteration
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
arabic_to_english_names_transliteration_with_nmt_and_attention.ipynb		arabic_to_english_names_transliteration_with_nmt_and_attention.ipynb
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

data

data