Skip to content

Seq2Seq model that restores punctuation on English input text.

Notifications You must be signed in to change notification settings

danielwatson6/neural-punctuator

Repository files navigation

Neural Punctuator

Seq2Seq model that restores punctuation on English input text.

Setup

No dependencies needed besides Python 3.7.4, virtualenv, and TensorFlow.

virtualenv env
source env.sh
pip install tensorflow  # or tensorflow-gpu / custom wheel

For more information on the project structure, see the README in the tensorflow-boilerplate repository.

Datasets

  • Google News Word2Vec: place the .bin file in the data directory, run python -m scripts.install_word2vec, and optionally delete the .bin file.

  • WikiText: download, unzip, and place both of the word-level datasets in the data directory. Clean the data with python -m scripts.clean_wikitext, and optionally delete the original *.tokens and *.raw files.

Usage

This will train a model with non-default batch size and learning rate, and will save its weights in experiments/myexperiment0:

source env.sh
run fit myexperiment0 seq2seq wikitext --batch_size=32 --learning_rate=0.001

Modify other hyperparameters similarly with --name=value. To see all supported hyperparameters, check the main classes on models/seq2seq.py and data_loaders/wikitext.py.

To evaluate the trained model based on gold-normalized edit distance using beam search:

run evaluate myexperiment0 --beam_width=5

To interact with the trained model in the console by typing input sentences:

run interact myexperiment0