Skip to content

This repository focuses on automatic categorization of documents, involving the assignment of categories to texts based on their content. We will utilize pretrained deep learning models ParsBERT and LaBSE for this purpose.

Notifications You must be signed in to change notification settings

faezeh-gholamrezaie/Persian-News-Article-Classification

Repository files navigation

Persian News Article Classification

Persian News Classification Based on the article

ITRC, IRAN, Tehran


Corpus

This project involves the collection of two datasets: one from Fars News and the other from Tasnim.

Description of the Fars News Dataset :

category Number
culture 6000
sports 5999
politics 5994
economy 5992
social 5991

Description of the Tasnim Dataset :

category Number
culture 5750
sports 5347
politics 5483
economy 6095
social 5161

Aggregated dataset used :

datasets


Deep Learning Models

Two Deep Learning Pretrain model have been experimented:

category Precision Accuracy
ParsBERT: Transformer-based Model for Persian Language Understanding 0.85 0.83
Language-agnostic BERT Sentence Embedding (LaBSE) 0.85 0.84

Evaluation (Confusion matrix)

Confusion matrix using the best model ParsBERT:

Confusion matrix

category label
culture 0
economy 1
politics 2
social 3
sports 4

Team mate

Marjan Godarzi

Elham Ghasemi

Gholshid Ranjbaran

Alireza Parvaresh

Faezeh Gholamrezaie

About

This repository focuses on automatic categorization of documents, involving the assignment of categories to texts based on their content. We will utilize pretrained deep learning models ParsBERT and LaBSE for this purpose.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published