Skip to content

prabormukherjee/Language_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Language_classifier

Here I performed language classification using sklearn, a powerful ML library. The languages are

  • Slovak
  • Czech
  • English

The dataset has a class imbalence problem, Czech consists of only few words. Then I perform the Naive Bayes algorithm which is based on Bayes Theorm, which performs poorly on test data, more specificly on CS class. Then I adjusted 2 parameters and the performance has improved. F1 score got increased to .83 from .61.
Finally I used Subwords to Shift Perspective (Check here). The performance has improved a little bit. F1 score become .84 this time

I added all the important files related to the project. All source codes are available on notebook file. A little bit theory is also added in the notebook. The dataset is also available. The helper function what is imported in the notebook can also be find from here.

About

Classifying English, Slovak, Czech language using Naive Bayes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published