Pyramidal Recurrent Units (PRU) for Language Modeling

This repository contains the source code of our paper, Pyramidal Recurrent units for language modeling, which is accepted for publication at EMNLP'18.

NOTE: Though we tested our module (PRU) on a highly competitive task of language modeling, our module is generic enough and can be used for different applications where RNNs (such as LSTMs and GRUs) are currently used, such as question answering, text classification, and machine translation.

Block diagram of PRU

Downloading Language Modeling Datasets

You can download the dataset by running the following script

bash getdata.sh

This script will download the data and place in directory named data.

Training PRU on the PenTree dataset

You can train PRU on the PenTree dataset (or PTB) by using following command:

CUDA_VISIBLE_DEVICES=0 python main.py --model PRU --g 4 --k 2 --emsize 400 --nhid 1400 --data ./data/penn

where

 --model specifies the recurrent unit (either LSTM or PRU)
 --g specifies the number of groups to be used in the grouped linear transformation, 
 --k species the number of pyramidal levels in the pyramidal transformation
 --emsize specifies the embedding layer size
 --nhid specifies the hidden layer size 
 --data specifies the location of the data.

Please see main.py for details about other supported command line arguments.

If you want to train language model using LSTM on PTB dataset, then you can do it by using the following command:

CUDA_VISIBLE_DEVICES=0 python main.py --model LSTM --emsize 400 --nhid 1000 --data ./data/penn

NOTE: Our implementation currently supports training on single GPU. However, you can easily update it to support multiple GPUs by using DataParallel module in PyTorch.

Testing PRU on the PenTree dataset

You can test the models using following command

CUDA_VISIBLE_DEVICES=0 python test.py --data ./data/penn --weightFile <trained model file> --batchSize 1

Please see test.py for details about command line arguments.

Pretrained models on the Pentree dataset

Below table compares the perplexity scores of language models with LSTM and PRU as a recurrent unit (with standard dropout only). We can see that PRU has better generalization properties than LSTMs and enables learning representations in very high-dimensional space efficiently.

Model	g	k	emsize	nhid	# Params	Perplexity (val)	Perplexity (test)	Model Size (in MB)	Model Link
LSTM	NA	NA	400	1000	19.87	67.8	66.05	159	Link
LSTM	NA	NA	400	1200	25.79	69.29	67.17	206	Link
LSTM	NA	NA	400	1400	32.68	70.23	68.32	261	Link
--	--	--	--	--	--	--	--	--	--
PRU	1	2	400	1000	18.97	69.99	68.06	151	Link
PRU	2	2	400	1200	18.51	66.39	64.30	148	Link
PRU	4	2	400	1400	18.90	64.40	62.62	151	Link

NOTE:

The performance of PRU can be further improved by using advanced methods such as weight dropout and dynamic evaluations. If you evaluate above pretrained models with dynamic evaluation, then you should see an improvement in perplexity by about 6-8% for both LSTM and PRU-based models.
Replacing LSTM with PRU in AWD-LSTM, we obtained the best results, however, with fewer parameters. See our paper for more details.

Pre-requisite

To run this code, you need to have following libraries:

PyTorch - We tested with v0.3.1
Python - We tested our code with Pythonv3. If you are using Python v2, please feel free to make necessary changes to the code.

We recommend to use Anaconda. We have tested our code on Ubuntu 16.04.

Acknowledgements

A large portion of this repo is borrowed from AWD-LSTM repository.

Citation

If PRU is useful for your research, then please cite our paper.

@inproceedings{mehta2018pru,
  title={Pyramidal Recurrent Unit for Language Modeling},
  author={Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, and Hannaneh Hajishirzi},
  booktitle={EMNLP},
  year={2018}
}

References

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. "Regularizing and optimizing LSTM language models." arXiv preprint arXiv:1708.02182 (2017).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
images		images
LICENSE		LICENSE
PRU.py		PRU.py
PRUTransforms.py		PRUTransforms.py
README.md		README.md
data.py		data.py
getdata.sh		getdata.sh
main.py		main.py
model.py		model.py
test.py		test.py
train_utils.py		train_utils.py
utils.py		utils.py

License

sacmehta/PRU

Folders and files

Latest commit

History

Repository files navigation

Pyramidal Recurrent Units (PRU) for Language Modeling

Block diagram of PRU

Downloading Language Modeling Datasets

Training PRU on the PenTree dataset

Testing PRU on the PenTree dataset

Pretrained models on the Pentree dataset

Pre-requisite

Acknowledgements

Citation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages