Skip to content

Using SANAD (Single-label Arabic News Articles Dataset) to automate text categorization.

Notifications You must be signed in to change notification settings

MuhammadHelmyOmar/SANAD-Text-Classification

Repository files navigation

SANAD_Text_Classification

SANAD is a Single-label Arabic News Articles Dataset for automatic text categorization.
NlP pipeline:
1- Collecting raw data from files
2- Eploring Data
3- Data Preprocessing including:
- Dropping empty records
- Removing Stopwords
- Normalizing Text
- Removing Punctuations and noise
4- Applying Different ML Models and Comparing them by reporting Categorization Accuracy:
- Simple Logistic Regression
- Naive Bayes
- Random Forrest
- Neural Network
- Vectorizing Text
- Encoding Labels\

About

Using SANAD (Single-label Arabic News Articles Dataset) to automate text categorization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published