Parser Building Toolkit for JavaScript
-
Updated
May 12, 2024 - TypeScript
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Parser Building Toolkit for JavaScript
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Persian NLP Toolkit
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
The fast scanner generator for Java™ with full Unicode support
Solves basic Russian NLP tasks, API for lower level Natasha projects
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Open Korean Text Processor - An Open-source Korean Text Processor
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
🌿 NodeJS PHP Parser - extract AST or tokens
Fast and customizable text tokenization library with BPE and SentencePiece support
Colin's ALM Corner Custom Build Tasks
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Python port of Moses tokenizer, truecaser and normalizer
Self-contained Japanese Morphological Analyzer written in pure Go
High-Performance Stemmer, Tokenizer, and Spell Checker for R