Skip to content

emorynlp/elit-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ELIT Tokenizer

ELIT (Emory Information and Language Technology) features an English tokenizer that splits text into a sequence of tokens and segment them into sentences using lexicon-based heuristics. This project is led by the Emory NLP Research Laboratory and under the Apache 2.0 license.

  • Latest release: 1.0 (10/15/2021)

Installation

Python 3.7 or higher is recommended:

pip install elit_tokenizer

Documentation

Contact