Neural Punctuator

Seq2Seq model that restores punctuation on English input text.

Setup

No dependencies needed besides Python 3.7.4, virtualenv, and TensorFlow.

virtualenv env
source env.sh
pip install tensorflow  # or tensorflow-gpu / custom wheel

For more information on the project structure, see the README in the tensorflow-boilerplate repository.

Datasets

Google News Word2Vec: place the .bin file in the data directory, run python -m scripts.install_word2vec, and optionally delete the .bin file.
WikiText: download, unzip, and place both of the word-level datasets in the data directory. Clean the data with python -m scripts.clean_wikitext, and optionally delete the original *.tokens and *.raw files.

Usage

This will train a model with non-default batch size and learning rate, and will save its weights in experiments/myexperiment0:

source env.sh
run fit myexperiment0 seq2seq wikitext --batch_size=32 --learning_rate=0.001

Modify other hyperparameters similarly with --name=value. To see all supported hyperparameters, check the main classes on models/seq2seq.py and data_loaders/wikitext.py.

To evaluate the trained model based on gold-normalized edit distance using beam search:

run evaluate myexperiment0 --beam_width=5

To interact with the trained model in the console by typing input sentences:

run interact myexperiment0

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
data_loaders		data_loaders
experiments		experiments
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
boilerplate.py		boilerplate.py
env.sh		env.sh
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

data_loaders

data_loaders

experiments

experiments

models

models

scripts

scripts

.gitignore

.gitignore

README.md

README.md

boilerplate.py

boilerplate.py

env.sh

env.sh

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

Neural Punctuator

Setup

Datasets

Usage

About

Releases

Packages

Languages

danielwatson6/neural-punctuator

Folders and files

Latest commit

History

Repository files navigation

Neural Punctuator

Setup

Datasets

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages