Skip to content

checkpoint_b500000_e4_complete.pth

Latest
Compare
Choose a tag to compare
@andrewschreiber andrewschreiber released this 11 Jun 22:42
· 30 commits to master since this release

Transformer on full dataset
500k batches, 1024 batch size.
Dropout = 0.1
lr = 6e-6
no gradient clipping