bigscience/train at master · bigscience-workshop/bigscience

History

Name		Name	Last commit message	Last commit date
parent directory ..
tr1-13B-base		tr1-13B-base
tr10-13B-ml		tr10-13B-ml
tr11-176B-ml		tr11-176B-ml
tr12-1B3-oscar		tr12-1B3-oscar
tr13-mtf		tr13-mtf
tr14-mup		tr14-mup
tr2		tr2
tr3-1B3-baseline		tr3-1B3-baseline
tr4-1B3-rotary		tr4-1B3-rotary
tr5-1B3-multilingual		tr5-1B3-multilingual
tr6-1B3-prefix-lm		tr6-1B3-prefix-lm
tr7-alibi		tr7-alibi
tr8-104B-wide		tr8-104B-wide
tr8b-104B		tr8b-104B
tr9-glu		tr9-glu
README.md		README.md
arch-and-scaling-template.slurm		arch-and-scaling-template.slurm
fixes.md		fixes.md
lessons-learned.md		lessons-learned.md
memory.md		memory.md
sanity-checks.md		sanity-checks.md
tflops_optimization.md		tflops_optimization.md

README.md

Training scripts

This folder gathers training scripts for the different arch/scaling and engineering experiments. The naming convention is tr<number>-<short-description>. The current baseline that architecture and scaling experiments compare to is tr3d. In order to launch a new experiment, you should probably start from the arch-and-scaling template.

Some tips:

TFlops optimization: How to make sure that given a set of hardware you optimize the speed at which you train.
Instrumentation: How to sync with the hub

Stored checkpoints

Location of the checkpoints of the trained models plus logs and anything else of importance - e.g. eval harness results:

tr1-13B: gs://bigscience-backups/tr1-13B/
tr3m-1B3-emb-norm-pile: $six_ALL_CCFRSTORE/checkpoints/tr3m-1B3-emb-norm-pile
tr4-1B3-rotary: `$six_ALL_CCFRSTORE/checkpoints/
tr4b-350M-rotary: `$six_ALL_CCFRSTORE/checkpoints/
tr4c-1B3-rotary-oscar: $six_ALL_CCFRSTORE/checkpoints/tr4c-1B3-rotary-oscar
tr6-1B3-prefix-lm: $six_ALL_CCFRSTORE/checkpoints/tr6-1B3-prefix-lm
tr6-1B3-prefix-lm-unbiased-loss: $six_ALL_CCFRSTORE/checkpoints/tr6-1B3-prefix-lm-unbiased-loss
tr6b-350M-prefix-lm: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm
tr6b-350M-prefix-lm-PP2: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm-PP2
tr6b-350M-prefix-lm-unbiased-loss: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm-unbiased-loss
tr6c-350M-prefix-lm-reset-attention-mask: $six_ALL_CCFRSTORE/checkpoints/tr6c-350M-prefix-lm-reset-attention-mask
tr6c-350M-prefix-lm-reset-attention-mask.backup: $six_ALL_CCFRSTORE/checkpoints/tr6c-350M-prefix-lm-reset-attention-mask.backup
tr6d-350M-prefix-lm-pile: $six_ALL_CCFRSTORE/checkpoints/tr6d-350M-prefix-lm-pile
tr6e-1B3-pile: $six_ALL_CCFRSTORE/checkpoints/tr6e-1B3-pile
tr6f-1B3-oscar-no-loss-on-targets-only: $six_ALL_CCFRSTORE/checkpoints/tr6f-1B3-oscar-no-loss-on-targets-only
tr6g-1B3-oscar-loss-reweighting: $six_ALL_CCFRSTORE/checkpoints/tr6g-1B3-oscar-loss-reweighting
tr7a-1B3-alibi (not a real alibi pos embedding experiment - the alibi matrix were not used in this experiment): $six_ALL_CCFRSTORE/checkpoints/tr7a-1B3-alibi
tr7b-350-alibi (not a real alibi pos embedding experiment - the alibi matrix were not used in this experiment): $six_ALL_CCFRSTORE/checkpoints/tr7b-350M-alibi
tr7d-1B3-alibi: six_ALL_CCFRSTORE/checkpoints/tr7d-1B3-alibi
tr9b-350M-swiglu: six_ALL_CCFRSTORE/checkpoints/tr9b-350M-swiglu
tr9c-1B3-swiglu-pile: six_ALL_CCFRSTORE/checkpoints/tr9b-1B3-swiglu-pile

Files

train

Directory actions

More options

Directory actions

More options

Latest commit

History

train

Folders and files

parent directory

Training scripts

Stored checkpoints