Skip to content

hazemhosny/ArabicDialectClassification

Repository files navigation

Arabic Dialect Classification

Introduction

  1. Using tfidf vectorizer, Logistic Regression, and other Machine Learning models aside with preprocessing techniques used for Arabic Language Tweets to classify Arabic Dialect from text.
  2. Using AraBERT model version 2 for Deep Learning approach and comparing results with Machine Learning Approach using Confussion Matrix, F1-score.

for more info about the repo please check pdf slides, and check Models Directory for results

Deployment

1. Logistic Regression Model

BotExample

2. AraBERTv2 Model

BotExample

Run FastAPI Server

  1. First need to download related packages in conda envirnoment
PyTorch
Pandas
matplotlib
scikit-learn
transformers
pyarabic
emoji
nltk
  1. Make sure you activate env where all packages are downloaded.
  2. Run ModelTraining_ML.ipynb, ModelPrediction-AraBert.ipynb to get models pickle files
  3. After that go the saved pickle files and copy (or cut) paste to static folder for the FastAPI server within folders for ML models, and other for AraBERT model.
static
│   ├───ML_models
│   └───output_dir
  1. Run python main.py
  2. After running your server go to localhost:5000/docs in browser, and try out different POST methods with different text
  3. Enjoy your server app