SANAD is a Single-label Arabic News Articles Dataset for automatic text categorization.
NlP pipeline:
1- Collecting raw data from files
2- Eploring Data
3- Data Preprocessing including:
- Dropping empty records
- Removing Stopwords
- Normalizing Text
- Removing Punctuations and noise
4- Applying Different ML Models and Comparing them by reporting Categorization Accuracy:
- Simple Logistic Regression
- Naive Bayes
- Random Forrest
- Neural Network
- Vectorizing Text
- Encoding Labels\
-
Notifications
You must be signed in to change notification settings - Fork 0
MuhammadHelmyOmar/SANAD-Text-Classification
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Using SANAD (Single-label Arabic News Articles Dataset) to automate text categorization.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published