Cantonese Linguistics and NLP
-
Updated
May 23, 2024 - Python
Cantonese Linguistics and NLP
The "colorizer_UI" app colorizes text based on part-of-speech tagging, offering an interactive UI for inputting text, selecting a language, and viewing colorized output.
Project that aims to sentenize all the open data of Riksdagen and other sources to create an easily linkable dataset of sentences that can be refered to from Wikidata lexemes and other resources
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
A state-of-the-art Arabic part-of-speech tagger leveraging the XLMR transformer model With an impressive testing accuracy of 97.49% and a remarkable testing F1-score of 96.44% on the Arabic UD Treebank.
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
Error detection tool for finding unlikely word sequences in Estonian texts based on the words' part-of-speech.
USC CSCI544 - Applied Natural Language Processing - Fall 2023 - Prof Mohammad Rostami
A Natural Language Processing toolkit for sequence labeling in its simplest form.
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA techniques) over Reddit Posts from TLDRHQ dataset.
Parts-of-Speech Tagging using Hidden Markov Model and Viterbi Algorithm
Script used for POS-tagging the C-CLAMP corpus
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
A PyTorch Library for Sequence Labeling Tasks such as Named-entity Recognition or Part-of-speech Tagging
Part-Of-Speech tagging in polish with finetuned RoBERTa model
This repository contains the implementation of a Part-of-Speech (POS) tagging system using Hidden Markov Models (HMMs) along with various decoding techniques and adversarial training strategies for sequence labeling tasks. Project for course DSCI 599 - Optimization Techniques for Data Science, Fall 2023
Определение частей речи / Нормализация текста: приведение всех слов к словарной форме в тексте на русском языке
Part-of-Speech Tagging / Normalization of words in English texts
POSIT aims to segment and tag mixed-text that contains English and C-like code, such that the user both knows what a token is, and within the language it's used in, what role, such as an AST tag or PoS tag, it serves.
Add a description, image, and links to the part-of-speech-tagging topic page so that developers can more easily learn about it.
To associate your repository with the part-of-speech-tagging topic, visit your repo's landing page and select "manage topics."