Skip to content

Using NLTK's Bayesian Classification to classify any roman string as English, Spanish or Japanese

Notifications You must be signed in to change notification settings

vinsis/word-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word-classification

Accuracy: ~80%

Overview

A piece of code that tells classifies any given word sounds as Japanese, English or Spanish based on its structure.

For example, it classifies:

  • "mirodo" as Japanese
  • "mirado" as Spanish
  • "mirror" as English

It is able to detect subtle differences between words. For example, it classifies "Texas" as English but "Tejas" as Spanish. Texas is actually anglicized form of the Spanish word Tejas. It also classifies Colorado as Spanish.

How to use it

In any Python Interpreter, go to the location where classifier.py is located and type the commands like so:

import classifier
classifier.trainData()
classifier.classifyWord('hello')

About

Using NLTK's Bayesian Classification to classify any roman string as English, Spanish or Japanese

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages