Skip to content

Releases: BaguaSys/bagua

v0.9.2

10 Mar 10:29
6d79588
Compare
Choose a tag to compare

Bug Fixes

Python

  • fix qadam NAN problem (#654)
  • fix: fail to compile Aluminum

v0.9.1

24 Feb 18:17
Compare
Choose a tag to compare

Bug Fixes

Python

  • Revert "fix: to_bagua_tensor compatibility with torch 1.6.0 (#355)"

Features

Python, core

  • improve NCCL lib version check (#525)

v0.9.0

17 Jan 00:50
Compare
Choose a tag to compare

Bug Fixes

Other

  • Reuse fused parameter tensors in fuse_step (#410)
  • Call step closure in qadam optimizer step (#432)
  • Fix need_reset condition (#454)
  • Do negotiation in async native op (#447)
  • Fix find_unused_parameters (#452)
  • Fix qadam non-deterministic (#459)
  • Add LIBRARY_PATH env in install_master.sh (#465)
  • Fix typo in install_master.sh (#471)

Python

  • CUDA 11.5 can't get nccl package (#415)
  • Fix process group compatibility with torch 1.6.0 (#413)
  • Fix ci random fail (#445)
  • Fix async algorithm (#479)

Features

Core

  • Initial support for C interface (#325)

Other

  • Support NODE_RANK environment variable (#426)
  • Choose bagua service port dynamically (#431)
  • Use bagua_module_name to identify different modules (#438)
  • Add algorithm registry (#433)
  • Add compatibility for NCCL version under 2.10 (#449)
  • Add broadcast object api (#437)
  • Support qadam in fused optimizer (#477)

Python

  • Support PyTorch DDP compatible distributed training API (#312)
  • Support torch-api-compatiable all_reduce (#377)
  • Associate PyTorch Process Group with Bagua Process Group using cache (#402)
  • Support find_unused_parameters on BaguaDDP (#409)
  • Add BAGUA_AUTOTUNE_SERVER_WAIT_TIME env (#474)

v0.8.2

10 Nov 00:17
Compare
Choose a tag to compare

Features

Python

  • Support switching between different algorithms (#299)
  • Support separate algorithm declaration and implementation (#246)

Python, core

  • Support process group in with_bagua, support hierarchical communication in bytegrad algorithm (#300)
  • Support mutable bucket tensors (#271)
  • Support all_to_all_single (#361)

Bug Fixes

Other

  • Fuse optimizer oom and make it stateless (#207)
  • to_bagua_tensor compatibility with torch 1.6.0 (#355)

Python

  • Use separate process group for async communication thread to avoid potential hangs (#298)
  • Do not fail if checkpoints path exist (#305)
  • Fix is_moe_param (#306)
  • Change to_bagua_tensor API to support PyTorch 1.10 (#338)
  • Fix fused optimizer with multiple param groups (#356)

v0.8.1.post1

16 Oct 06:25
Compare
Choose a tag to compare

Bug Fixes

  • Process group not yet supported in with_bagua
  • Use separate process group for async communication thread to avoid potential hangs (#298)

v0.8.1

16 Oct 02:11
Compare
Choose a tag to compare

[0.8.1] - 2021-10-16

Features

  • Support moe (#208)
  • Support checkpointing for moe (#242)
  • Use single bucket for decentralized algorithm to improve performance (#275)
  • Support process group (#228)
  • Add barrier api (#290)

v0.8.0

26 Sep 13:21
Compare
Choose a tag to compare

[0.8.0] - 2021-09-26

Bug Fixes

Ci

  • Only run publish once on git tag

Core

  • Fix compressed buffer can not be scattered to odd number of ranks

Other

  • Fix ci pypi versioning
  • Remove init.py and python version, use cargo version
  • Move import bagua_install_library to install library function
  • Merge bagua_install_library and setup.py, remove nccl<=2.6 support
  • Fix alltoall_v parameter (#17)
  • Reduce and allgather python interface
  • Fix decompress incorrect pointer and typo in error msg
  • Fix python gil deadlock during getting data ptr
  • Fix benchmark script requirements
  • Fix alltoall_v parameter types (#27)
  • Always mark bagua padding tensor as ready
  • Make compress/decompress of BaguaTensor method string consistent (#33)
  • Fix scatter and reduce_scatter implementation (#40)
  • Substract overflow error for decentralized op (#39)
  • Fix QADAM params (#17)
  • Fix assert precision (#18)
  • Replace mutex with atomic bool for async op and add Aluminum submodule update (#67)
  • Fix duplicated dependency downloading during installation (#77)
  • Fix async algorithm aborting and hanging (#78, #81)
  • Fix qadam algorithm call (#20)
  • Fix missing symbols in the zip library (#24)
  • Fix random autotune server hang (#206)
  • Bagua-net library path mismatch, make --enable_bagua_net argument style consistent with other args (#218)

Python

  • Fix random autotune-service hang
  • Handle conflicts caused by sklearn upgrade (#225)

Features

Ci

  • Only publish pypi for master commits

Other

  • Add async model average algorithm (#110)
  • Add cached dataset wrapper (#148)
  • Support sync batchnorm (#151)
  • Add --enable-bagua-net option in launcher (#183)
  • Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
  • Add requirements.txt, only download dataset on local rank 0 (#2)
  • Add python packaging related files
  • Add __version__ variable
  • Install nccl deps in bagua core and add generated __version__ variable
  • Add version.py placeholder to prevent file not found error
  • Initial support for python op (#2)
  • Add 5 min timeout for buckets' comm op (#5)
  • Replace NCCL with Aluminum (#7)
  • Add synethetic benchmark script (#5)
  • Add elastic training example (#7)
  • Support alltoall_v (vector alltoall) (#14)
  • Add reduce and allgather python interface
  • Support reduce and allgather op with Reduction op enum
  • Support creating BaguaTensor by passing torch tensor directly (#19)
  • Compatible mode for getting pytorch tensor info with Python interpreter
  • Better debug log including tensor info when executing ops
  • Add native low precision decentralized operator (#26)
  • Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
  • Make full precision decentralized op stateless (#36)
  • Add communication_primitives example (#12)
  • Use nccl 2.10 avg op for all algorithms using averaging (#46, #45)
  • Add opentelemetry to report tensor ready order (#42)
  • Add deterministic flag (#15)
  • Add native async model average algorithm (#41)
  • Add examples for async model average algorithm (#14)
  • Support packet splitting and multi-stream parallel transmission (#5)
  • Support ncclnet v3 and remove the dependency on nccl in the installation environment (#17)
  • Add sync interval param to async examples (#19)
  • Suppport tokio backend (#21)
  • Support bagua-net (#89)

v0.7.0

16 Aug 11:29
Compare
Choose a tag to compare

Bug Fixes

  • Autotune api conflict (#131)

Features

  • Add low precision decentralized algorithm (#103)
  • Add all communication primitives such as send recv to communication module (#128)
  • Make full precision decentralized op stateless (#126)
  • Support nccl 2.10 ReduceOp.AVG (#149)
  • Add support for reporting tensor completion order (#146)

v0.7.0-rc2

22 Jul 05:11
12ea737
Compare
Choose a tag to compare
v0.7.0-rc2 Pre-release
Pre-release
chore: requires bagua-core 0.4-0.5 now

v0.7.0-rc1

22 Jul 05:00
0c978e9
Compare
Choose a tag to compare
v0.7.0-rc1 Pre-release
Pre-release
feat: make full precision decentralized op stateless (#126)

BREAKING CHANGE: `BaguaBucket.append_decentralized_synchronous_op` now only supports full precision decentralized communication.