Skip to content

Releases: aws/aws-ofi-nccl

AWS OFI NCCL v1.9.1

15 Apr 21:45
v1.9.1-aws
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This is a bugfix release which requires Libfabric v1.18.0 or later and supports NCCL 2.21.5-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).

Bug Fixes:

  • Fix release distribution generation to include missing headers introduced in v1.9.0. This fixes issue #382.
  • Restrict libcuda link-time dependency to builds with testing enabled
  • Build fixes to explicitly link against libm and libpthread used by the plugin

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Checksum (sha512) for the release tarball:

77e44dcdb77e6b25cae882d2124b6d9a2a66f2b85321ae827ec7e3fd88bacd214a537a2490a578af44b7457cc655b2e382fc148b6ed8594a68a30d145f3ce70e  aws-ofi-nccl-1.9.1-aws.tar.gz

AWS OFI NCCL v1.9.0

05 Apr 22:07
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release requires Libfabric v1.18.0 or later and supports NCCL 2.21.5-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).

New Features:

  • Support v8 plugin interface introduced with NCCL 2.20. This enables the use of the user memory registration feature recently introduced in NCCL.
  • Update the tuner component to support v2 ext-tuner interface introduced with NCCL 2.21.
  • Reduce ordering constraints for control messages, to reduce head of line blocking under congestion.

Bug Fixes:

  • Increase the number of communicators to 256K (from 4K), supporting larger all-to-all groups.
  • Improve logging in some corner case error conditions.

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Checksum (sha512) for the release tarball:

7c86650f2f275b97bd08ff66b24ae8fef593269c068ec543259903d0eec80a0fe4153a3f171700e7e3dcb3b809a1d6aba82d5e7dc52ec138eacd7353629d1bc0  aws-ofi-nccl-1.9.0-aws.tar.gz

AWS OFI NCCL v1.8.1

25 Feb 21:40
v1.8.1-aws
Compare
Choose a tag to compare

This is a bugfix release that requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).

Bug Fixes:

  • Fix an issue with the ID pool's reference counting and allocation
  • Improved error propagation for failed NCCL requests, allowing applications to fail early instead of blocking on requests that can never be completed.

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Checksum (sha512) for the release tarball:

4ee21380176d5a76e4af0233ac44d1d46f92fd34941ecfaa104b7567a16cc84503c0abe59e540d36d79675bb3cc443979ed319f39582e301814d0653ea184508  aws-ofi-nccl-1.8.1-aws.tar.gz

AWS OFI NCCL v1.8.0

19 Feb 17:48
v1.8.0-aws
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).

New Features:

  • A tuner component for the plugin that picks the optimal NCCL algorithm and protocol at a given scale and message size.
  • Improved communicator and memory region identifier management.
  • Migrated from CUDA Runtime API to functional equivalents in CUDA Driver API in preparation for dma-buf support for memory registration. With this change, the plugin uses the same mechanism as NCCL to interact with the CUDA subsystem.
  • No longer forcing a flush operation for network operations when running with H100 GPUs, even when running with older NCCL versions (< v2.19.1).
  • Improvements to internal device-agnostic APIs.
  • Support for NCCL v7 ext-net plugin interface introduced in NCCL v2.19.3.
  • Support for Ubuntu 22.04 LTS distribution.

Bug Fixes:

  • Set the maximum NVLS tree chunk size used to 512KiB to recover from a performance regression introduced in NCCL v2.19.4, using a parameter introduced in NCCL v2.20.3.
  • Prevent possible invocation of CUDA calls in libfabric by requiring a libfabric version of v1.18.0 or newer.
  • Fix debug prints that reported incorrect device IDs during initialization
  • Fixes to MAX_COMM computation.
  • Better handling of NVLS enablement when NCCL is statically linked to applications
  • Fixes to internal API return codes
  • Configuration system fixes for Neuron builds
  • Fixes to plugin environment parsing to be case insensitive
  • Miscellaneous fixes that address memory leaks, NULL derefences, and compiler warnings.
  • Updates and improvements to the project documentation.

Testing:

This release has been tested extensively with NCCL v2.19.4-1 for functionality and performance. This release has also been lightly tested with NCCL v2.20.3-1 that was released earlier this week. It was tested with Libfabric versions up to Libfabric v1.19.0.

Checksum (sha512) for the release tarball:

7bad7995e99649dc3ae4c46b2b0011225134703050ae83ab837cd46a7ff979079809cbd117e50cf5169428dd397ab099fea6249d12f891bff94b2d5579b0c0d9  aws-ofi-nccl-1.8.0-aws.tar.gz

AWS OFI NCCL v1.7.4

04 Dec 20:44
v1.7.4-aws
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:

New Features:

  • Hard fail if GPUDirect RDMA initialization fails on an EC2 instance that should support GPUDirect RDMA (such as P4d.24xlarge or P5.48xlarge), rather than fall back to host copy buffers at significantly reduced performance. Setting the environment variable OFI_NCCL_DISABLE_GDR_REQUIRED_CHECK=1 will disable this behavior.
  • Change the threshold at which the rdma transport switches from round robin to striping from 8 KiB to 256 KiB, improving the efficiency of large message transfers.

Bug Fixes:

  • Fixed debugging output in some initialization failure cases.
  • Request FI_LOCAL_COMM feature from Libfabric, as flush and eager copies are both implemented via local communication.
  • Fix initialization when using the Libfabric TCP provider.
  • Improve documentation on using the plugin with AWS's Elastic Fabric Adapter (EFA).
  • Improve handling of Neuron device detection when the plugin is used with Tranium instances.
  • Fix segfault in error case of freelist memory growth.
  • The test programs that only support 2 ranks now fail with a useful error message if run with another number of ranks.

This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.

AWS OFI NCCL v1.7.3

05 Oct 19:26
v1.7.3-aws
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:

  • Do not disable LL and LL128 protocols on P5 instances.
  • Add support for g5.48xlarge instance types.
  • Fix a block in use leak in the freelist implementation.
  • For NCCL 2.18.5 or later, don't disable NVLS support.
  • Fix bug in handling retry error issues from Libfabric in the RDMA transport (P5 instance types).

This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.

AWS OFI NCCL v1.7.2

25 Aug 17:58
v1.7.2-aws
a463b88
Compare
Choose a tag to compare

This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:

  • Fix compilation against CUDA versions prior to 11.3.
  • Fix allocation of free lists to avoid accidently registering user data, which can cause corruption on fork() with older Linux kernels.
  • Fix memory leak with registered bounce buffers.
  • Fix improper usage of optlen in call to fi_getopt().
  • Numerous memory cleanup fixes.

This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.

AWS OFI NCCL v1.7.1

29 Jul 00:31
v1.7.1-aws
8a79a34
Compare
Choose a tag to compare

This release is part of enabling AWS's P5 platform. It is not recommended for other platforms at this time; we will release a general 1.7.x series in the near future.

This release removes the direct dependency on libcudart.so and dynamically loads the shared library at runtime, similar to the behaviors of NCCL and Libfabric.

This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.

AWS OFI NCCL v1.7.0

25 Jul 22:27
v1.7.0-aws
69f2292
Compare
Choose a tag to compare

This release is part of enabling AWS's P5 instance type. It has no useful features for other platforms.

This release requires Libfabric v1.11.0 or later and supports NCCL v2.17.1-1 while maintaining backward compatibility with older NCCL versions (up to NCCL v2.4.8). It was tested with Libfabric versions up to Libfabric v1.17.1.

The plugin has been tested with following libfabric providers using unit tests bundled in the source code and nccl-tests test suite:

efa
tcp

AWS OFI NCCL v1.7.0rc1-aws

21 Jul 22:54
v1.7.0rc1-aws
d520367
Compare
Choose a tag to compare
Pre-release

Pre-release of the next 1.7.0 release series, which will (initially) target only the AWS EFA platform.