mos-pytorch1.1

Breaking the Softmax Bottleneck: A High-Rank Language Model

Implementation with PyTorch-1.1 for MoS:https://arxiv.org/pdf/1711.03953.pdf

🚩 Note that this is not the official code, please refer https://github.com/zihangdai/mos for more details.

This code refered the paper

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang*, Zihang Dai*, Ruslan Salakhutdinov, William W. Cohen (*: equal contribution)

Preprint 2017

Requirements

Python 3.6, PyTorch 1.1.0

Below are results of the current version on Penn Treebank as reported in zihangdai/mos#9 . One may need further tuning to match the original results.

MoS w/o finetune: Valid 58.34 Test 56.18

MoS: Valid 56.83 Test 54.64

MoS + dynamic evaluation: Valid 49.03 Test: 48.43

Download the data

./get_data.sh

Train the models (to reproduce our results)

Penn Treebank

First, train the model

python main.py --data data/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu

Second, finetune the model

python finetune.py --data data/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu

where PATH_TO_FOLDER is the folder created by the first step (concatenation of PTB with a timestamp).

Third, run dynamic evaluation

python dynamiceval.py --model PATH_TO_FOLDER/finetune_model.pt --lamb 0.075

WikiText-2 (Single GPU)

First, train the model

python main.py --epochs 1000 --data data/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu

Second, finetune the model

python finetune.py --epochs 1000 --data data/wikitext-2 --save PATH_TO_FOLDER --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --emsize 300 --batch_size 15 --lr 20.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu

Third, run dynamic evaluation

python dynamiceval.py --data data/wikitext-2 --model PATH_TO_FOLDER/finetune_model.pt --epsilon 0.002

WikiText-2 (3 GPUs)

This will yield the same results as using one single GPU, but will be faster.

First, train the model

CUDA_VISIBLE_DEVICES=0,1,2 python main.py --epochs 1000 --data data/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 15 --max_seq_len_delta 20 --dropouti 0.55

Second, finetune the model

CUDA_VISIBLE_DEVICES=0,1,2 python finetune.py --epochs 1000 --data data/wikitext-2 --save PATH_TO_FOLDER --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --emsize 300 --batch_size 15 --lr 20.0 --dropoutl 0.29 --small_batch_size 15 --max_seq_len_delta 20 --dropouti 0.55

Third, run dynamic evaluation

python dynamiceval.py --data data/wikitext-2 --model PATH_TO_FOLDER/finetune_model.pt --epsilon 0.002

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
PTB-20200510-191019		PTB-20200510-191019
LICENSE		LICENSE
README.md		README.md
bleu.py		bleu.py
data.py		data.py
dynamic_score.py		dynamic_score.py
dynamiceval.py		dynamiceval.py
embed_regularize.py		embed_regularize.py
finetune.py		finetune.py
get_data.sh		get_data.sh
locked_dropout.py		locked_dropout.py
main.py		main.py
model.py		model.py
run_dynamic.sh		run_dynamic.sh
run_dynamic_score.sh		run_dynamic_score.sh
run_finetune.sh		run_finetune.sh
run_score.sh		run_score.sh
run_train.sh		run_train.sh
score.py		score.py
utils.py		utils.py
weight_drop.py		weight_drop.py

License

yfreedomliTHU/mos-pytorch1.1

Folders and files

Latest commit

History

Repository files navigation

mos-pytorch1.1

Breaking the Softmax Bottleneck: A High-Rank Language Model

Requirements

Download the data

Train the models (to reproduce our results)

Penn Treebank

WikiText-2 (Single GPU)

WikiText-2 (3 GPUs)

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages