No Train No Gain

Code for the paper "No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models"; Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner .

Running the code

See the README for the:

Citation and license

We use two excellent open source codebases to implement our experiments:

The BERT experiments are forked of Cramming
The T5 experiments are forked of NanoT5

If you find this repository useful, please consider citing both our work and these original codebases.

To cite our work, we suggest the following BibTeX:

@misc{kaddourNoTrainNo2023,
	title = {No {Train} {No} {Gain}: {Revisiting} {Efficient} {Training} {Algorithms} {For} {Transformer}-based {Language} {Models}},
	url = {http://arxiv.org/abs/2307.06440},
	doi = {10.48550/arXiv.2307.06440},
	urldate = {2023-07-17},
	publisher = {arXiv},
	author = {Kaddour, Jean and Key, Oscar and Nawrot, Piotr and Minervini, Pasquale and Kusner, Matt J.},
	month = jul,
	year = {2023},
	note = {arXiv:2307.06440 [cs]},
}

We provide separate licenses for the BERT experiments and the T5 experiments.

Contact

Feel free to open an issue, or email us, with any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bert		bert
t5		t5
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert

bert

t5

t5

.gitignore

.gitignore

README.md

README.md

Repository files navigation

No Train No Gain

Running the code

Citation and license

Contact

About

Contributors 2

Languages

JeanKaddour/NoTrainNoGain

Folders and files

Latest commit

History

Repository files navigation

No Train No Gain

Running the code

Citation and license

Contact

About

Resources

Stars

Watchers

Forks

Languages