GitHub - chahuja/lru: Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

Lattice Recurrent Unit

Implementation of Lattice Recurrent Unit as described in this paper. I encourage you to check out the website to get an overview of the results and observations described in the paper.

The code has been written in PyTorch and has two key components:

Language Model: Given a bunch of sentences, the model learns to predict the next character (word or more generally, token) conditioned on all the characters until the current time step. (Check class langModel in src/model.py)
Lattice: A Lattice Network (unlike LSTM and GRU) supports distinct outputs along depth and time. Hence we implemented class Lattice (in src/model.py) which supports recurrent units with 2 different outputs. Batches with multiple length sequences are allowed if they are converted to torch.nn.utils.rnn.PackedSequence(*). In addition, Lattice also supports multiple layers within an RNN cell (class LRUxCell and class HIGHWAYxCell in src/model.py)

(*)-Currently only works for batch_first=True -- I will make it batch_first independent as soon as I get time

Lattice Language Model

Requirements

This code is written in Python 3.6 and requires PyTorch (>=v0.2). I would suggest using the anaconda environment provided in env/ as this would save the hassle of installing all the requirements.

conda env create -n <env-name> -f env/torch-0.2.0-cuda80-py36-pandas.yaml

To active the environment run

source activate <env-name>

Alternatively, if you would like to save space or be adventurous, you could cherry-pick and install the missing requirements.

Usage

All the source files are in a sub-directory src

cd src

Dataset

I would suggest storing every dataset in a directory of its own as the code creates multiple meta-data files essential to the training process.

Training

A language model using various RNN units can be trained using char.py.

python char.py -data <path-to-data> \
               -model <model-name> \
               -rnn_size <rnn-size> \
               -num_layers <num-layers> \
               -num_unrolling <backprob-time>
               -save_dir <path-to-results> \
               -cpk <checkpoint-name>

The models supported in this implementation are lru, rglru, pslru, gru, lstm, highway, and glstm.

For example the training scripts can be called with the following command

python char.py -data ../dataset/ptb/ptb.txt -model lru -rnn_size 500 -num_layers 2 -num_unrolling 50 -save_dir save/ptb/lru -cpk m

Arguments

It would be useful to check out all the arguments of the training code by running

python char.py -h

Sampling characters from a model

Weights for the best model (based on validation loss) are saved as a pickle file at the end of training (or after every epoch if -greedy_save 1 is used). These weights are used to initialize a language model from which characters can be sampled.

python generate.py -load <path-to-weights> -num_sample 1000

Note: Weights are stored in -save_dir with a suffix of _weights.p

GPU support

Using -cuda <gpu-id> for char.py and generate.py gives the option of choosing a device on a multi-gpu machine. If you wish to run train the model on a cpu, use <gpu-id> = -1.

Note: Currently multi-gpu training is not supported.

Other Implementations

A nice gist of the LRU Cell in Tensorflow by @simonnanty.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dataset-scripts		dataset-scripts
docs		docs
env		env
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset-scripts

dataset-scripts

docs

docs

env

env

src

src

README.md

README.md

Repository files navigation

Lattice Recurrent Unit

Requirements

Usage

Dataset

Training

Arguments

Sampling characters from a model

GPU support

Other Implementations

About

Releases

Packages

Languages

chahuja/lru

Folders and files

Latest commit

History

Repository files navigation

Lattice Recurrent Unit

Requirements

Usage

Dataset

Training

Arguments

Sampling characters from a model

GPU support

Other Implementations

About

Resources

Stars

Watchers

Forks

Languages