Skip to content

Releases: andrewschreiber/hs-math-nlp

checkpoint_b500000_e4_complete.pth

11 Jun 22:42
Compare
Choose a tag to compare

Transformer on full dataset
500k batches, 1024 batch size.
Dropout = 0.1
lr = 6e-6
no gradient clipping

Ablation model - 5,375,000 batches, 128 batch size, 5 epochs

23 Feb 06:29
Compare
Choose a tag to compare

Created to add ~500mb model file as releases get around GitHub's 100mb file size limit.