Releases: andrewschreiber/hs-math-nlp
Releases · andrewschreiber/hs-math-nlp
checkpoint_b500000_e4_complete.pth
Transformer on full dataset
500k batches, 1024 batch size.
Dropout = 0.1
lr = 6e-6
no gradient clipping
Ablation model - 5,375,000 batches, 128 batch size, 5 epochs
Created to add ~500mb model file as releases get around GitHub's 100mb file size limit.