Arabic-Text-Diacritization

Introduction

Diacritics are short vowels with a constant length that are spoken. The same word in the Arabic language can have different meanings and different pronunciations based on how it is diacritized.

In this project, we implement a pipeline to predict the diacritic of each character in an Arabic text using Natural Language Processing techniques.

Project Pipeline

Project Phases

Data Processing

Split the sentences with punctuations.
Split into smaller sentences of length no more than 500 characters (without counting diacritics).
Remove all the non-Arabic characters.
Remove diacritics.
Start each sentence with <s> and end it with </s> (both will have a corresponding class ‘no diacritics’ ‘’)

Feature extraction

One Hot encoding char level
Trainable embeddings char level
Word2vec embeddings + oneHot word and char level

Model

BLSTM
RNN

Evaluation

Diacritic Error Rate (DER) = 1 - Accuracy

Results

Final model used for the test set submission on Kaggle: BLSTM model with char embedding layer

Team: The Powerpuff Girls

demo video to the deployed model

nlp.mp4

Contributors

_{Asmaa Adel}

_{Samaa Hazem}

_{Norhan reda}

_HodaGamal

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Coding/model codes		Coding/model codes
images		images
models		models
NLP-Project-F23.pdf		NLP-Project-F23.pdf
README.md		README.md
nlp_project.pdf		nlp_project.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coding/model codes

Coding/model codes

images

images

models

models

NLP-Project-F23.pdf

NLP-Project-F23.pdf

README.md

README.md

nlp_project.pdf

nlp_project.pdf

Repository files navigation

Arabic-Text-Diacritization

Introduction

Project Pipeline

Project Phases

Data Processing

Feature extraction

Model

Evaluation

Results

demo video to the deployed model

Contributors

About

Releases

Packages

Languages

Hoda233/Arabic-Text-Diacritization

Folders and files

Latest commit

History

Repository files navigation

Arabic-Text-Diacritization

Introduction

Project Pipeline

Project Phases

Data Processing

Feature extraction

Model

Evaluation

Results

demo video to the deployed model

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Languages