Skip to content

Latest commit

 

History

History
60 lines (34 loc) · 1.82 KB

README.md

File metadata and controls

60 lines (34 loc) · 1.82 KB

NLP for Nepali

This repository contains State of the Art Language models and Classifier for Nepali, which is official language of Nepal and one of the official status gained language of India.

The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)

Dataset

Created as part of this project

  1. Nepali Wikipedia Articles

  2. Nepali News Dataset

Results

Language Model Perplexity

Architecture/Dataset Nepali Wikipedia Articles
ULMFiT 31.5
TransformerXL 29.3

Classification Metrics

ULMFiT
Dataset Accuracy Kappa Score
Nepali News Dataset 98.5 97.7

Visualizations

Embedding Space
Architecture Visualization
ULMFiT Embeddings projection
TransformerXL Embeddings projection

Pretrained Language Model

Download pretrained Language Models from here

Classifier

Download classifier from here

Tokenizer

Trained tokenizer using Google's sentencepiece

Download the trained model and vocabulary from here