Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
-
Updated
May 31, 2024 - Python
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Web tool to count LLM tokens (GPT, Claude, Llama, ...)
Oxide is a hybrid database and streaming messaging system (think Kafka + MySQL); supporting data access via REST and SQL.
DOM-aware tokenization for Hugging Face language models
[READ ONLY] Locate available classes by parent, interface or trait. Subtree split of the Spiral Tokenizer component (see spiral/framework)
⛄ Possibly the smallest Lua compiler ever
Byte-Pair Encoding tokenizer for large language models
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
🎤 vibrato: Viterbi-based accelerated tokenizer
Sentiment analysis models using NLP and other important basics of NLP and subwords......
Morphologically biased byte-pair encoding
(py package) tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
A multilingual morphological analysis library.
Lexical analysis for tokenizing a basic programming language