Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
-
Updated
Apr 30, 2024 - Python
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Context-sensitive word embeddings with subwords. In Rust.
Finalfusion embeddings in Python
This repository contains the code to learn subword embeddings from the arXiv dataset of 1.7M+ scholarly papers.
🕵️ Language Model based on RNN for generating Sherlock Holmes stories.
Properly handle position-dependent phones in a subword lexicon FST
Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
Semantic role labeling with subwords (character, character-ngram and morphology)
Morfessor EM+Prune
Classified sentences into one of Slovak, Czech, and English. Implemented relevant preprocessing steps, addressed the class imbalance in training set by employing the learned theory of Naive Bayes Models, and implementing subword units.
A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.
Morfessor EM+Prune
A tool for generating sub-word (phone or grapheme) level embeddings from an HTK-style MLF ASR corpus
Cognate-aware morphological segmentation
Morfessor FlatCat
Subword Neural Machine Translation
Morfessor demonstration
Add a description, image, and links to the subword-units topic page so that developers can more easily learn about it.
To associate your repository with the subword-units topic, visit your repo's landing page and select "manage topics."