Releases · BaguaSys/bagua

10 Mar 10:29

woqidaideshi

v0.9.2

6d79588

v0.9.2 Latest

Latest

Bug Fixes

Python

fix qadam NAN problem (#654)
fix: fail to compile Aluminum

Assets 2

24 Feb 18:17

woqidaideshi

v0.9.1

e190be6

v0.9.1

Bug Fixes

Python

Revert "fix: to_bagua_tensor compatibility with torch 1.6.0 (#355)"

Features

Python, core

improve NCCL lib version check (#525)

Assets 2

17 Jan 00:50

NOBLES5E

v0.9.0

10a6352

v0.9.0

Bug Fixes

Other

Reuse fused parameter tensors in fuse_step (#410)
Call step closure in qadam optimizer step (#432)
Fix need_reset condition (#454)
Do negotiation in async native op (#447)
Fix find_unused_parameters (#452)
Fix qadam non-deterministic (#459)
Add LIBRARY_PATH env in install_master.sh (#465)
Fix typo in install_master.sh (#471)

Python

CUDA 11.5 can't get nccl package (#415)
Fix process group compatibility with torch 1.6.0 (#413)
Fix ci random fail (#445)
Fix async algorithm (#479)

Features

Core

Initial support for C interface (#325)

Other

Support NODE_RANK environment variable (#426)
Choose bagua service port dynamically (#431)
Use bagua_module_name to identify different modules (#438)
Add algorithm registry (#433)
Add compatibility for NCCL version under 2.10 (#449)
Add broadcast object api (#437)
Support qadam in fused optimizer (#477)

Python

Support PyTorch DDP compatible distributed training API (#312)
Support torch-api-compatiable all_reduce (#377)
Associate PyTorch Process Group with Bagua Process Group using cache (#402)
Support find_unused_parameters on BaguaDDP (#409)
Add BAGUA_AUTOTUNE_SERVER_WAIT_TIME env (#474)

Assets 2

10 Nov 00:17

NOBLES5E

v0.8.2

7e1e86f

v0.8.2

Features

Python

Support switching between different algorithms (#299)
Support separate algorithm declaration and implementation (#246)

Python, core

Support process group in with_bagua, support hierarchical communication in bytegrad algorithm (#300)
Support mutable bucket tensors (#271)
Support all_to_all_single (#361)

Bug Fixes

Other

Fuse optimizer oom and make it stateless (#207)
to_bagua_tensor compatibility with torch 1.6.0 (#355)

Python

Use separate process group for async communication thread to avoid potential hangs (#298)
Do not fail if checkpoints path exist (#305)
Fix is_moe_param (#306)
Change to_bagua_tensor API to support PyTorch 1.10 (#338)
Fix fused optimizer with multiple param groups (#356)

Assets 2

16 Oct 06:25

NOBLES5E

v0.8.1.post1

dc578ee

v0.8.1.post1

Bug Fixes

Process group not yet supported in with_bagua
Use separate process group for async communication thread to avoid potential hangs (#298)

Assets 2

16 Oct 02:11

NOBLES5E

v0.8.1

de15e30

v0.8.1

[0.8.1] - 2021-10-16

Features

Support moe (#208)
Support checkpointing for moe (#242)
Use single bucket for decentralized algorithm to improve performance (#275)
Support process group (#228)
Add barrier api (#290)

Assets 2

26 Sep 13:21

NOBLES5E

v0.8.0

7e394b5

v0.8.0

[0.8.0] - 2021-09-26

Bug Fixes

Ci

Only run publish once on git tag

Core

Fix compressed buffer can not be scattered to odd number of ranks

Other

Fix ci pypi versioning
Remove init.py and python version, use cargo version
Move import bagua_install_library to install library function
Merge bagua_install_library and setup.py, remove nccl<=2.6 support
Fix alltoall_v parameter (#17)
Reduce and allgather python interface
Fix decompress incorrect pointer and typo in error msg
Fix python gil deadlock during getting data ptr
Fix benchmark script requirements
Fix alltoall_v parameter types (#27)
Always mark bagua padding tensor as ready
Make compress/decompress of BaguaTensor method string consistent (#33)
Fix scatter and reduce_scatter implementation (#40)
Substract overflow error for decentralized op (#39)
Fix QADAM params (#17)
Fix assert precision (#18)
Replace mutex with atomic bool for async op and add Aluminum submodule update (#67)
Fix duplicated dependency downloading during installation (#77)
Fix async algorithm aborting and hanging (#78, #81)
Fix qadam algorithm call (#20)
Fix missing symbols in the zip library (#24)
Fix random autotune server hang (#206)
Bagua-net library path mismatch, make --enable_bagua_net argument style consistent with other args (#218)

Python

Fix random autotune-service hang
Handle conflicts caused by sklearn upgrade (#225)

Features

Ci

Only publish pypi for master commits

Other

Add async model average algorithm (#110)
Add cached dataset wrapper (#148)
Support sync batchnorm (#151)
Add --enable-bagua-net option in launcher (#183)
Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
Add requirements.txt, only download dataset on local rank 0 (#2)
Add python packaging related files
Add __version__ variable
Install nccl deps in bagua core and add generated __version__ variable
Add version.py placeholder to prevent file not found error
Initial support for python op (#2)
Add 5 min timeout for buckets' comm op (#5)
Replace NCCL with Aluminum (#7)
Add synethetic benchmark script (#5)
Add elastic training example (#7)
Support alltoall_v (vector alltoall) (#14)
Add reduce and allgather python interface
Support reduce and allgather op with Reduction op enum
Support creating BaguaTensor by passing torch tensor directly (#19)
Compatible mode for getting pytorch tensor info with Python interpreter
Better debug log including tensor info when executing ops
Add native low precision decentralized operator (#26)
Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
Make full precision decentralized op stateless (#36)
Add communication_primitives example (#12)
Use nccl 2.10 avg op for all algorithms using averaging (#46, #45)
Add opentelemetry to report tensor ready order (#42)
Add deterministic flag (#15)
Add native async model average algorithm (#41)
Add examples for async model average algorithm (#14)
Support packet splitting and multi-stream parallel transmission (#5)
Support ncclnet v3 and remove the dependency on nccl in the installation environment (#17)
Add sync interval param to async examples (#19)
Suppport tokio backend (#21)
Support bagua-net (#89)

Assets 2

16 Aug 11:29

NOBLES5E

v0.7.0

7c6b139

v0.7.0

Bug Fixes

Autotune api conflict (#131)

Features

Add low precision decentralized algorithm (#103)
Add all communication primitives such as send recv to communication module (#128)
Make full precision decentralized op stateless (#126)
Support nccl 2.10 ReduceOp.AVG (#149)
Add support for reporting tensor completion order (#146)

Assets 2

22 Jul 05:11

NOBLES5E

v0.7.0-rc2

12ea737

v0.7.0-rc2 Pre-release

Pre-release

chore: requires bagua-core 0.4-0.5 now

Assets 2

22 Jul 05:00

NOBLES5E

v0.7.0-rc1

0c978e9

v0.7.0-rc1 Pre-release

Pre-release

feat: make full precision decentralized op stateless (#126)

BREAKING CHANGE: `BaguaBucket.append_decentralized_synchronous_op` now only supports full precision decentralized communication.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fixes

Python

Bug Fixes

Python

Features

Python, core

Bug Fixes

Other

Python

Features

Core

Other

Python

Features

Python

Python, core

Bug Fixes

Other

Python

Bug Fixes

[0.8.1] - 2021-10-16

Features

[0.8.0] - 2021-09-26

Bug Fixes

Ci

Core

Other

Python

Features

Ci

Other

Bug Fixes

Features

Releases: BaguaSys/bagua

v0.9.2

Bug Fixes

Python

v0.9.1

Bug Fixes

Python

Features

Python, core

v0.9.0

Bug Fixes

Other

Python

Features

Core

Other

Python

v0.8.2

Features

Python

Python, core

Bug Fixes

Other

Python

v0.8.1.post1

Bug Fixes

v0.8.1

[0.8.1] - 2021-10-16

Features

v0.8.0

[0.8.0] - 2021-09-26

Bug Fixes

Ci

Core

Other

Python

Features

Ci

Other

v0.7.0

Bug Fixes

Features

v0.7.0-rc2

v0.7.0-rc1