#

subword

Here are 14 public repositories matching this topic...

scarletcho / KoLM

Korean text normalization and language preparation package for LM in Kaldi-based ASR system

lm korean asr subword pseudomorpheme

Updated Apr 23, 2020
Python

lallubharteja / KWS-Scripts

Keyword Search Recipe for Subword ASR

keyword-spotting subword kaldi-asr kws

Updated Jul 12, 2019
Shell

cooelf / subMrc

Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)

question-answering reading-comprehension subword bpe bpe-segmentation

Updated Nov 6, 2018
Python

cooelf / subword_seg

Effective Subword Segmentation for Text Comprehension (TASLP 2019)

morphology question-answering segmentation reading-comprehension subword

Updated Nov 6, 2018
C++

scarletcho / subword-mikolov

An implementation of subword division algorithm proposed in T. Mikolov (2012).

english language-model subword

Updated Sep 25, 2019
HTML

andreasgrv / johnny

johnny - a neural network graph based DEPendency Parser

nlp parsing chainer nlp-machine-learning dependency-parsing subword

Updated Mar 25, 2021
Python

wang-h / FMDL

Unsupervised Word Segmentation using Minimum Description Length for Neural Machine Translation (NMT)

unsupervised segmentation subword

Updated Dec 21, 2018
C++

kkaryl / AI6127-Deep_NLP

This repository contains source code implementation of assignments for NTU's MSAI course AI6127 on Deep Neural Networks for Natural Language Processing (2019 Sem 2).

nlp ner language-model subword msai

Updated Dec 11, 2020
Jupyter Notebook

zouharvi / tokenization-scorer

Simple-to-use scoring function for arbitrarily tokenized texts.

segmentation tokenization subword bpe

Updated Mar 8, 2024
Python

TiMauzi / dawg

The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.

nlp tree parsing tree-structure theoretical-computer-science dawg subword subword-segmentation subwords

Updated Sep 13, 2022
Java

Ishan-Kotian / Tokenizer_NLP

Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.

cat nlp count tensorflow tokenizer natural-language character sentence keras-classification-models subword nerual-network imdb-dataset deep-learning-architectures rnn-keras smaller-units tokenizer-nlp

Updated Jun 30, 2021
Jupyter Notebook

jluo41 / NLPText

corpus subword textpreprocessing field-grains granularity

Updated Jan 8, 2023
Jupyter Notebook

burcgokden / BERT-Subword-Tokenizer-Wrapper

A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.

machine-learning deep-learning tensorflow machine-translation vocabulary-builder bert subword wordpiece berttokenizer tensorflow-text

Updated Jul 6, 2021
Python

Scitator / subword-nmt

Subword Neural Machine Translation

deep-learning seq2seq neural-machine-translation language-model subword

Updated Jun 20, 2017
Python

Improve this page

Add a description, image, and links to the subword topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the subword topic, visit your repo's landing page and select "manage topics."