Skip to content

Releases: allenai/OLMo

v0.3.0

25 Apr 19:23
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for Grouped Query Attention.
  • Added commonsense_qa and social_iqa downstream evaluation tasks
  • Makes it possible to read from http/https the same way we read from s3/r2.
  • Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks
  • Tokenizer patch
  • Added option to specify number of model replicas when using hybrid sharding.

Changed ⚠️

  • Rename Olmo to OLMo everywhere in the codebase
  • Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.

Removed 👋

  • Removed AMDLayerNorm, since the original layer norm bug has been fixed and we don't need this workaround anymore.
  • Removed OLMoParallelBlock.

Fixed ✅

  • Don't log garbage on nodes that aren't rank 0
  • Don't crash in the HF code when we are referring to a tokenizer in a local file
  • Point official training scripts to publicly available URLs
  • Corrected the resize_token_embeddings method in the OLMoForCausalLM class to properly update the token embeddings when resizing the vocabulary.
  • Changed tie_weights method to a no-op as weight tying is handled in olmo/model.py
  • Fixed the size calculation for qk layer norm
  • Fixed pipeline test failure that occurs due to a bug in transformers version 4.39.1
  • Make hf_olmo compatible with transformers versions >=4.40.0

Commits

3b16e21 Merge pull request #556 from allenai/shanea/make-hf-olmo-support-new-transformers
ccf7bf0 Merge pull request #555 from allenai/shanea/wandb-cancel-failure-bypass
7be71cd use correct PG when collecting metrics with HYBRID shard (#551)
06786a7 Merge pull request #548 from allenai/shanea/fix-olmo-name-hf
4ed135e Merge pull request #540 from allenai/shanea/hybrid-sharding-num-groups-2
2eae988 Merge pull request #546 from allenai/shanea/add-olmo-1.7-7b-checkpoints
d2afcaa Add cfg option --scheduler.warmup_min_lr (#542)
9d40898 Merge pull request #537 from allenai/AkshitaB-tokenizer-patch
62c7954 Merge pull request #536 from allenai/shanea/storage-cleaner-wandb-path-from-checkpoint
657a55e Merge pull request #494 from allenai/shanea/storage-cleaner-move-entry
9a0a84a Merge pull request #527 from allenai/PublicTrainingData
0de5fdc Merge pull request #501 from djliden/dl/fix-embedding-resize
4792f94 Adds a new experimental sharded checkpointer from OLMo-core (#532)
1c12980 make garbage collection interval configurable (#533)
db2dee2 Merge pull request #503 from djliden/dl/hf-weight-tying
8fad649 Merge pull request #534 from allenai/shanea/fix-transformer-cache-position-regression
71f7014 Merge pull request #528 from allenai/add-mmlu-mc-5shot
8472d0b Merge pull request #521 from allenai/davidbrandfonbrener-patch-1
194012a Merge pull request #523 from allenai/davidbrandfonbrener-patch-2
8949bd8 Added deprecation for memmap (#517)
83cc8b1 Merge pull request #464 from allenai/olmo7-ablations
f8aef84 Merge pull request #509 from allenai/epwalsh/manual-gc
0ac82a9 Merge pull request #508 from allenai/RunDataloader
74de51d Merge pull request #414 from allenai/mitchish65-2
417af0e Merge pull request #504 from allenai/add-csqa-siqa
666da70 Patch other S3 methods with 404 detection fix
0b6e28c Fix checking HTTP status code for boto3 responses
0b835a8 Merge pull request #500 from allenai/shanea/expose-official-checkpoints
50da7a4 Add work-arounds for new-style checkpointing issues
6d42d7a Fix hang when training is canceled
7eb7f3d Merge pull request #455 from gahdritz/main
ed47c29 Merge pull request #453 from hxdtest/only_rank0_log_metrics
ad8198e Merge pull request #495 from allenai/add-basic-math
1511fed Merge pull request #487 from allenai/fix-mmlu-prompt-bug
c2840e4 Merge pull request #493 from allenai/shanea/storage-cleaner-move-improvements
658f7cc Merge pull request #466 from allenai/rename
eb5b2da Merge pull request #490 from allenai/RemoveAMDLN
752353b Merge pull request #488 from allenai/shanea/optimize-unsharding-2

v0.2.5

07 Mar 00:31
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed default value of --tokenizer argument to scripts/prepare_tulu_data.py to be an absolute path, not relative path, the script can be run from other directories.
  • Added the option to directly pass input embeddings to OLMo and OLMoForCausalLM.
  • Added support for Python 3.8.
  • Added code to throw an error if output_attentions is set to True in forward call to OLMoForCausalLM. This functionality hasn't been implemented yet.
  • Fixed running with data loading workers on LUMI

Added 🎉

  • Added output_hidden_states argument and associated functionality to OLMo and OLMoForCausalLM to return model intermediate hidden states.
  • Added MMLU downstream evaluation tasks, with prompt variations.
  • Added support for PyTorch v2.2.
  • Added ability to show logs from all ranks
  • Added option for QKV clipping.

Changed ⚠️

  • Refactor torch.load monkey patching for legacy checkpoint unsharding in anticipation of unsharding implementation change.

Commits

c499632 Add option for QKV clipping (#489)
31d8528 Pull checkpoint patch from mitchish-gqa-2
03d7643 Merge pull request #486 from allenai/shanea/monkey-patch-ctx-manager
fd3a57b Merge pull request #483 from allenai/shanea/storage-cleaner-unshard-improvements
1d264e4 Merge pull request #481 from allenai/WorkersOnLumi
70ad30c Merge pull request #480 from allenai/Firehose
493c0b8 Add MMLU prompt variants (#484)
cb711e2 Add support for PyTorch v2.2 (#476)
67d24f5 Merge pull request #468 from allenai/mmlu-downstream
0c58bee Fix bug when clipping is disabled
922db6a Only run the profiler through a single cycle (#463)
37ca789 Merge pull request #462 from allenai/epwalsh/fsdp-wrap-patch
cc36709 Add attn bias arg to HF wrapper (#458)
7f7abbb Merge pull request #451 from sarahwie/main
9fd9130 Add support for Python 3.8 (#448)
d9c0993 Require Python>=3.9 for now
97296e6 Merge pull request #442 from allenai/shanea/add-input-embedding-arg
3be4c1e add link to W&B logs for 1B run
d7d4de4 Add link to OLMo-7B-Twin-2T W&B logs
cf12108 Update README.md (#429)
15af668 freeze official configs for reproductions (#421)
7739fe1 Add link to W&B logs for OLMo-7B
80db5e3 Fix default value of --tokenizer
6765317 Add link to paper in README badge

v0.2.4

02 Feb 18:40
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.

Commits

8a3f2d8 Fix HF integration for Python < 3.10 (#426)
49c8647 Use temp branding GIF for logo (for now) (#419)

v0.2.3

31 Jan 18:36
Compare
Choose a tag to compare

What's new

Commits

98c115c Bump version to v0.2.3 for release
0e53b33 specify dependencies in pyproject.toml (#418)
18e5dad update PyPI release process
141cc94 Merge pull request #415 from allenai/readme-inf
2587240 Merge pull request #417 from allenai/Muennighoff/ckpt
a5a01a2 Merge pull request #416 from allenai/nol_rdme
98425a5 Merge pull request #413 from allenai/shanea/storage-cleaner-s3-upload-cleanup
3053bfa Update install instructions in README
f36ac42 Merge pull request #410 from allenai/epwalsh/fine-tune-with-label-masking
dcae8e8 Merge pull request #411 from allenai/epwalsh/lr-schedule-tokens
45ed078 Add more mcli configs
905359e fix bug with saving unsharded checkpoint
3e3df71 Merge pull request #409 from allenai/epwalsh/tulu-fine-tune
a2e1d13 Merge pull request #368 from allenai/mitchish-lumi
5a735dd Merge pull request #350 from allenai/mitchish
df19554 Merge pull request #388 from allenai/mitchish65
23eb949 Train a few steps after time limit reached (#362)
ac1aee1 Merge pull request #408 from allenai/NixLogz
6da42cf ensure we save checkpoint at end of loop
568a3d8 Merge pull request #406 from allenai/hf-olmo-loading
3c51402 Merge pull request #407 from allenai/shanea/storage-cleaner-avoid-redundant-copy
53217d2 Merge pull request #405 from allenai/shanea/storage-cleaner-fix-upload-path
5eb26aa Merge pull request #404 from allenai/shanea/storage-cleaner-minor-fixes
87ed747 backwards compat fix
1c13e5f Merge pull request #403 from allenai/shanea/storage-cleaner-fix-max-archive-size
685d11b Merge pull request #400 from allenai/shanea/storage-cleaner-wandb
5bdccc3 Merge pull request #402 from allenai/shanea/storage-cleaner-is-run-improvement
75d6738 Merge pull request #401 from allenai/shanea/storage-cleaner-is-file-no-key
0475f3a Make logo a little smaller
1184050 Add logo to README
e2d77c4 Ephemeral checkpoints (#397)
6f2abfb Merge pull request #399 from allenai/shane/storage-cleaner-fix-s3-upload
f8beb5b Merge pull request #398 from allenai/shanea/storage-cleaner-move-run
185d7e2 Move remaining top-level mkd docs into docs folder (#395)
5d03d38 Merge pull request #396 from allenai/shanea/storage-cleaner-delete-temp-files
fe49693 Merge pull request #382 from allenai/shanea/storage-cleaner-unsharding-legacy
1ede949 Merge pull request #381 from allenai/shanea/storage-cleaner-unsharding-2
9cc7154 update some links to new repo (#394)

v0.2.2

11 Dec 05:58
Compare
Choose a tag to compare

What's new

Commits

364e21e Merge pull request #393 from allenai/hf-olmo-auto-map

v0.2.1

11 Dec 00:11
Compare
Choose a tag to compare

What's new

Commits

ad3e676 missing readme
9fa23b4 Merge pull request #392 from allenai/hf-bug-fix

v0.2.0

10 Dec 06:43
Compare
Choose a tag to compare

What's new

Added 🎉

  • GPT-based model.
  • Tokenizer and data pre-processing pipeline.
  • training script.
  • Triton-based FlashAttention.

Commits

e801af8 add release proc
e643f5e update pyproject
dbc8177 Bump version to v0.2.0 for release
e99dbe5 Merge pull request #391 from allenai/hf-olmo-new
a120ab2 Merge pull request #380 from allenai/shanea/storage-cleaner-download-upload
4e849e4 Merge pull request #390 from allenai/shanea/storage-cleaner-archive-fix-2
1dbc346 Merge pull request #378 from allenai/shanea/storage-cleaner-cached-path
22cefa2 Merge pull request #389 from allenai/shanea/add-r2-scheme
ac01778 fix
6c79c63 add option to only unshard model
d1c185b Merge pull request #387 from allenai/epwalsh/dist-init
e30d29f Merge pull request #364 from allenai/shanea/storage-cleaner
ff883e5 Merge pull request #385 from allenai/epwalsh/max-duration-tokens
e16e606 Merge pull request #383 from allenai/epwalsh/start-new-epoch

v0.1.1

27 Nov 01:04
Compare
Choose a tag to compare

What's new

Commits

v0.1.0

27 Nov 00:49
Compare
Choose a tag to compare

What's new

Added 🎉

  • GPT-based model.
  • Tokenizer and data pre-processing pipeline.
  • training script.
  • Triton-based FlashAttention.

Commits

f1ba78e moving readme to notes
6c94994 Bump version to v0.1.0 for release
f09a500 Add a "constant" LR scheduler (#376)
dcdadc5 Merge pull request #377 from allenai/Muennighoff/split-model-comps
80b081b Merge pull request #374 from allenai/epwalsh/threaded-data-loading
9c8e67e Merge pull request #373 from allenai/chore/paths
1f51fec Merge pull request #375 from allenai/Muennighoff/move-torch-utils
9d5aa11 Merge pull request #370 from allenai/CheckpointLoading
38be6a7 Merge pull request #372 from allenai/epwalsh/optim-state-fix
c205912 Fix how we update grad_norm_exp_avg (#371)
9320f9b Fix unsharding local checkpoints w/ torch 2.1 (#369)
b8a174f Merge pull request #367 from allenai/FacePalm
13548fd Merge pull request #365 from allenai/wrap_and_shard
6c0e419 Add gradient clipping warmup (#363)
0afafd6 Fix stale links in README, scripts cleanup (#359)
42dba3c remove data team's stuff (#357)
4bb6966 consolidate Python configs into pyproject.toml, other clean up (#353)
a952f44 minor fixes to kempner docs (#354)
026793e Merge pull request #347 from allenai/epwalsh/block-groups-load-fix
62fc2fe Add two more FSDP wrapping strategies (#355)
4ccf2bd Merge pull request #346 from allenai/shanea/llama-block
da91f34 Merge pull request #317 from allenai/Llama
fd2425f Adds a YAML validator to automatically find the last checkpoint (#348)
1099942 Upload profiler data to remote save folder (#338)
db0756f Merge pull request #335 from allenai/Kempner
5c64338 Merge pull request #343 from allenai/ActivationCheckpointing
558102e Merge pull request #342 from allenai/S3Client
cd73387 Add option to FSDP wrap by groups of blocks (#340)
c1a4519 Fix dtype casting on CPU (#339)
104d1ce Move remaining checkpointing logic to Checkpointer class (#331)
a465caa Merge pull request #337 from allenai/UnshardSkipKeys
4980bad set mcli time limit to null
f974a1d update mitch ish configs
07404f8 Merge pull request #308 from allenai/fine-grained-metrics
4644ff5 Lazily init s3 client (#333)
809fe9d Load state dicts to CPU (#328)
1bff308 ensure bias is created in fp32 (#327)
d4744d0 Bring back global gradient clipping and improve speed of collecting metrics (#326)
54572d3 Add stop_at config option
e63b389 Fix SDP NaN bug (#323)
fddded5 Features to match OpenLM (#302)
d2e84fe Refactor checkpointing, bring back legacy sharded checkpointing as the default (#316)
fed4cf3 Merge pull request #311 from allenai/pass-thru-model-kwargs
a5cd0e6 Merge pull request #304 from allenai/ppl-suite-v3
536d029 Merge pull request #306 from allenai/keep-instance-info
0b5f68d Merge pull request #314 from allenai/ResetOptimizerState
e8bd122 Merge pull request #315 from allenai/MemoryEnvVar
da1f0b8 Merge pull request #313 from allenai/NanCheck
94133da Fixes pyspy script
602968a New-style checkpointing (again) (#307)
973090f implement bytes range for GS
18e061d Merge pull request #303 from allenai/shanea/fix-leftover-data-partitioning
0a1455b Merge pull request #301 from allenai/shanea/fix-s3-keyerror-failures
e7b92a6 comment
6ebd5d3 Add configs for v1.5 mix
8e2b8be Merge pull request #297 from allenai/PerfTests
62dde55 Make resource_path() more robust
900544e Prepare 7B config for MCLI (#295)
309bf84 Merge pull request #294 from allenai/petew/linear-schedule
91f499b Ignore warnings from urllib3, don't print config when it's huge
012e97f Merge pull request #290 from allenai/torch2.1init
aec449c update mcli config
27dd512 MCLI configs (#286)
a2b369a Merge pull request #279 from allenai/petew/train-metrics
5ad0d8c Merge pull request #282 from allenai/rsqrt
cc787ed Merge pull request #277 from allenai/shanea/add-truncated-normal-init
fabda71 Merge pull request #274 from allenai/petew/layer-norm
70a3f4c Merge pull request #280 from allenai/petew/reduce-dtype
2a7f694 Merge pull request #278 from allenai/update-hf-olmo-config
ef85d5c Merge pull request #265 from allenai/LayerNormAffine-ManualLayerNorm-Profiling
2df922b Merge pull request #276 from allenai/petew/sys-metrics
921c254 Merge pull request #275 from allenai/simplify-eff-benchmark
400a1d2 Minor cleanup of grad clipping (#273)
18f3459 fix updating grad_norm_exp_avg (#272)
54dbd48 Merge pull request #238 from allenai/inference-efficiency-pentathlon
95555f4 Refactor how we clip gradients and collect optimizer metrics (#261)
6cc09fe Merge pull request #271 from allenai/PythonProfiling2-UnwindingChanges
41b0663 Merge pull request #269 from allenai/PythonProfiling
2eedf07 Fix speed issue on LUMI with 7B model (#270)
d2abecd Merge pull request #267 from allenai/v2-pii-tagging
5b4c68e fix isort config
c8a2700 Merge pull request #253 from allenai/SavedTokenizer
26e17c3 Merge pull request #264 from allenai/LayerNormAffine-ManualLayerNorm-TurnedOffForSafety
a49f4ec Make Dropout a no-op when p=0.0 (#259)
a33dbb0 make flake8 happy
6b977d0 handle race conditions when saving to NFS on cirrascale (#255)
b4a1491 Merge pull request #250 from allenai/LayerNormAffine
4205a84 Merge pull request #257 from allenai/FasterGlobalIndices
e46b988 fix saving unsharded checkpoints
5fff93a Merge pull request #251 from allenai/soldni/fix-s2-fos
af0a584 Merge pull request #248 from allenai/TokenizerFromFile
7fbdb1c finish W&B runs quietly
9071816 Training improvements (#239)
642d0fa Add support for remote checkpoints and train data files (#237)
e350fd3 Add option to restart with new base LR (#236)
3ef79e1 Merge pull request #230 from allenai/eval-streamline
51a8a00 load state dict on gpu
3e8163e improve config resolution
7bd0ed2 medium script update
27d3538 add V1 mix small+medium configs (#211)
907e38b wait on all ranks until final ckpt dir exists (#235)
2118db5 Merge pull request #232 from allenai/ablations/soldni-gantry
698f859 Added the shuffling story
5508c04 Use numpy for shuffling instead of torch (#231)
952819b Don't reshuffle eval data each "epoch" (#229)
87f6a79 Merge pull request #223 from allenai/soldni/olmo-mixing
e64cf42 Merge pull request #227 from allenai/hf-olmo-tok
970a77c add more tests for memmap dataset
ba84b0b default to saving data indices
d02d4f1 Merge pull request #221 from allenai/faster-convert
acf372e Merge pull request #220 from allenai/hf-integration
43c29d9 Merge pull request #219 from allenai/iterable-dataset-memory-efficient
d3d00f1 Merge pull request #217 from allenai/soldni/lucy-fix
7c866c9 Merge pull request #216 from allenai/petew-cache-attn
05c6d53 clean up
fd1cfe8 Merge pull request #213 from allenai/llm-inference
ccb3869 Merge pull request #212 from allenai/gopher-fix
a80cdc1 Merge pull request #210 from allenai/soldni/filters_improvements
66c4936 fix c4-medium config
fde42f9 Merge pull request #194 from allenai/default-2x-batch-size
ab0b967 Merge pull request #209 from allenai/olmo-mix-1
b376486 Merge pull request #200 from allenai/c4-gopher-dedupe
b1584f9 Merge pull request #207 from allenai/petew-no-par-block
96f8817 Merge pull request #208 from allenai/soldni/tok_sample_code
a244f3a Merge pull request #203 from allenai/error-handling
86060d4 Merge pull request #199 from allenai/packed-evals
992838b Merge pull request #205 from allenai/hatespeech-nsfw-mixers
186fe1b Merge pull request #204 from allenai/nishant_pi_count_ablation
0b55217 Merge pull request #188 from allenai/ft-tagger-dataset
6a36cdf Merge pull request #161 from allenai/AkshitaB-stack-ablations
4074e42 Merge pull request #201 from allenai/soldni/tok_sample_code
58ad163 Merge pull request #197 from allenai/soldni/local_cache
c642d4f Merge pull request #198 from allenai/nishant_add_pi_counts_filter
eccf18c Merge pull request #162 from allenai/c4-gopher
49f9a0e Merge pull request #191 from allenai/save-indices
e89c61f Merge pull request #195 from allenai/code-eval
d33ea74 Merge pull request #193 from allenai/soldni/neox
9bfcde3 Fix secrets name in LUMI.md (#190)
b6fa4d9 add PPL evaluators to medium config
484b089 Merge pull request #189 from allenai/docs
83b39b5 remove unnused thread lock
ebc07f4 Merge pull request #187 from allenai/soldni/tokfile
ed7c0e8 ensure drop_last=True with train data
4d986ed fix speed monitor
2437cdf Merge pull request #181 from allenai/v0-small
3478cb0 Merge pull request #186 from allenai/soldni/tok_improve
348ed33 Merge pull request #185 from allenai/soldni/falcon
b79c3b7 Speed up preprocessing script (#177)
cb2c9cd Merge pull request #184 from allenai/format
72d4ff2 Merge pull request #183 from allenai/attr-merge
47c4ab9 More checkpointing improvements (#182)
a97d1f6 Merge branch 'main' of https://github.com/allenai/LLM into main
2567261 handle empty logzio token
0507c2d Restore dataset correctly when world size changes (#176)
f8eeb22 Merge pull request #178 from allenai/soldni/preview
9b21211 Merge pull request #179 from allenai/fix-tests
0d487c2 Merge pull request #174 from allenai/v0-small
4737c53 Merge pull request #175 from allenai/ClearGPUsFirst
1bdeae6 Merge pull request #173 from allenai/span-fix
87476f7 Prepare 1B baseline run (#170)
434cf67 Merge pull request #172 from allenai/fix-tests
d2442d6 add more tests
6d29ee4 fix dataloader max steps
7ffe204 Merge pull request #171 from allenai/soldni/decontamination-v2
0a485d2 Merge pull request #168 from allenai/DockerImage
27a3f3a Don't be so noisy during startup
1fba808 Merge pull request #165 from allenai/c4-medium-2x-bz
9da0e4b Merge pull request #167 from allenai/v1-small-config
391091c Merge pull request #166 from allenai/soldni/ablations_v2
9020c91 Merge pull request #164 from allenai/add-no-grad
c25f54b Merge pull request #163 from allenai/soldni/ablations
41a9969 syncronize time limits
1cd0b4b Merge pull request #134 from allenai/dependabot/pip/mypy-gte-1.0-and-lt-1.4
2a4031e Merge pull request #160 from allenai/soldni/filter-speedup
...

Read more