Efficient Softmax Approximation

Implementations of Blackout and Adaptive Softmax for efficiently calculating word distribution for language modeling of very large vocabularies.

LSTM language models are derived from rnnlm_chainer.

Available output layers are as follows

Linear + softmax with cross entropy loss. A usual output layer.
--share-embedding: A variant using the word embedding matrix shared with the input layer for the output layer.
--adaptive-softmax: Adaptive softmax
--blackout: BlackOut (BlackOut is not faster on GPU.)

Adaptive Softmax

Efficient softmax approximation for GPUs
Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, ICML 2017
paper
authors' Lua code

BlackOut

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey, ICLR 2016
paper
authors' C++ code

How to Run

python -u train.py -g 0

Datasets

PennTreeBank
Wikitext-2
Wikitext-103

For wikitext, run prepare_wikitext.sh for downloading the datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
adaptive_softmax.py		adaptive_softmax.py
black_out.py		black_out.py
construct_vocab.py		construct_vocab.py
nets.py		nets.py
prepare_wikitext.sh		prepare_wikitext.sh
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

adaptive_softmax.py

adaptive_softmax.py

black_out.py

black_out.py

construct_vocab.py

construct_vocab.py

nets.py

nets.py

prepare_wikitext.sh

prepare_wikitext.sh

train.py

train.py

utils.py

utils.py

Repository files navigation

Efficient Softmax Approximation

Adaptive Softmax

BlackOut

How to Run

Datasets

About

Releases

Packages

Contributors 2

Languages

soskek/efficient_softmax

Folders and files

Latest commit

History

Repository files navigation

Efficient Softmax Approximation

Adaptive Softmax

BlackOut

How to Run

Datasets

About

Topics

Resources

Stars

Watchers

Forks

Languages