Skip to content

BlackOut and Adaptive Softmax for language models by Chainer

Notifications You must be signed in to change notification settings

soskek/efficient_softmax

Repository files navigation

Efficient Softmax Approximation

Implementations of Blackout and Adaptive Softmax for efficiently calculating word distribution for language modeling of very large vocabularies.

LSTM language models are derived from rnnlm_chainer.

Available output layers are as follows

  • Linear + softmax with cross entropy loss. A usual output layer.
  • --share-embedding: A variant using the word embedding matrix shared with the input layer for the output layer.
  • --adaptive-softmax: Adaptive softmax
  • --blackout: BlackOut (BlackOut is not faster on GPU.)

Adaptive Softmax

  • Efficient softmax approximation for GPUs
  • Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, ICML 2017
  • paper
  • authors' Lua code

BlackOut

  • BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
  • Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey, ICLR 2016
  • paper
  • authors' C++ code

How to Run

python -u train.py -g 0

Datasets

  • PennTreeBank
  • Wikitext-2
  • Wikitext-103

For wikitext, run prepare_wikitext.sh for downloading the datasets.

About

BlackOut and Adaptive Softmax for language models by Chainer

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published