Skip to content

State of the Art Tokenizer, Language model and Classifier for Nepali, which is official language of Nepal and one of the official status gained language of India

BijayOCT25/nlp-for-nepali

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP for Nepali

This repository contains State of the Art Tokenizer, Language model and Classifier for Nepali, which is official language of Nepal and one of the official status gained language of India.

Dataset

Results

Language Model

on 30% validation set

  • Perplexity of language model: ~32

Classifier

  • Accuracy of classification model: ~97%
  • Kappa score of classification model: ~96

Pretrained Language Model

Download pretrained Language Model from here

Classifier

Download classifier from here

Tokenizer

Trained tokenizer using Google's sentencepiece

Download the trained model and vocabulary from here

About

State of the Art Tokenizer, Language model and Classifier for Nepali, which is official language of Nepal and one of the official status gained language of India

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%