Skip to content

brynne8/ccnorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ccnorm

Lua Unicode normalization data. It's kind of similar to Skeleton algorithm from Unicode tr39, while it considers readability and cases.

Latin letters

Any unicode that looks similar to a latin letter is normalized to latin letters, even if it's a number or a punctuation. Characters are normalized by shape for latin letters, so Greek letter ν (lower case Nu) is normalized to latin letter V.

Chinese characters

Chinese characters (a.k.a kanji) are normalized to Simplified Chinese as much as possible. The normalized Chinese sentence should be readable by native Chinese people.

Contributing

The ccnorm.lua is automatically generated, so please report bugs in Issues. Do not send pull requests.

About

Lua Unicode normalization data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages