Skip to content

Releases: openucx/ucx

v1.5.0 RC1

22 Dec 23:42
02078b9
Compare
Choose a tag to compare
v1.5.0 RC1 Pre-release
Pre-release

Features:

  • Statistics for UCT tag API
  • New emulation mode enabling full UCX functionality (Atomic, Put, Get)
    over TCP and RDMA-CORE interconnects that don't implement full RDMA semantics.
  • Non-blocking API for all one-sided operations. All blocking communication APIs marked
    as deprecated.
  • New client/server connection establishment API
  • Added CUDA support for stream API

Bugfixes:

  • Multiple bugfixes (full list on github)

v1.4.0

30 Oct 16:24
973dfb1
Compare
Choose a tag to compare

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)
  • Segfault fix for a code generated by armclang compiler
  • UCP memory-domain index fix for zero-copy active messages

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active
    intra-node RMA transpor. As a workaround user can disable CMA support at
    compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
    list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

v1.4.0 RC2

25 Oct 20:07
bba50b8
Compare
Choose a tag to compare
v1.4.0 RC2 Pre-release
Pre-release

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)
  • Segfault fix for a code generated by armclang compiler
  • UCP memory-domain index fix for zero-copy active messages

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active
    intra-node RMA transpor. As a workaround user can disable CMA support at
    compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
    list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

v1.4.0 RC1

15 Oct 18:42
cfb691d
Compare
Choose a tag to compare
v1.4.0 RC1 Pre-release
Pre-release

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active intra-node RMA transpor. As a workaround user can disable CMA support at compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

v1.3.1

23 Aug 10:25
befe098
Compare
Choose a tag to compare

Bugfixes:

  • Prevent potential out-of-order sending in shared memory active messages
  • CUDA: Include cudamem.h in source tarball, pass cudaFree memory size
  • Registration cache: fix large range lookup, handle shmat(REMAP)/mmap(FIXED)
  • Limit IB CQE size for specific ARM boards
  • RPM: explicitly set gcc-c++ as requirement

v1.3.0

09 Apr 21:42
f8064fa
Compare
Choose a tag to compare

Features:

  • Added stream-based communication API to UCP
  • Added support for GPU platforms: Nvidia CUDA and AMD ROCM software stacks
  • Added API for client/server based connection establishment
  • Added support for TCP transport
  • Support for InfiniBand tag-matching offload for DC and accelerated transports
  • Multi-rail support for eager and rendezvous protocols
  • Added support for tag-matching communications with CUDA buffers
  • Added ucp_rkey_ptr() to obtain pointer for shared memory region
  • Avoid progress overhead on unused transports
  • Improved scalability of software tag-matching by using a hash table
  • Added transparent huge-pages allocator
  • Added non-blocking flush and disconnect for UCP
  • Support fixed-address memory allocation via ucp_mem_map()
  • Added ucp_tag_send_nbr() API to avoid send request allocation
  • Support global addressing in all IB transports
  • Add support for external epoll fd and edge-triggered events
  • Added registration cache for knem
  • Initial support for Java bindings

Bugfixes:

  • Multiple bugfixes (full list on githib)

Bugfixes since RC1:

  • Fix flow control for DC transport
  • Fix compilation issue with mlx5 on ARM
  • Disable GDR-copy when ODP is used
  • Fixes for gcc8 compilation
  • Fix missing initialization of rndv_send_nbr thresholds
  • Fix mlx5 srq cleanup
  • Fix ep info print when there is no wireup lane
  • Optimize ugni locking

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2

Known issues:
#2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
#2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
#1977 - failure in shm/test_ucp_rma.blocking_small/0
#1926 - Timeout in mpi_test_suite with HW TM

v1.3.0 RC4

27 Mar 11:44
f8064fa
Compare
Choose a tag to compare
v1.3.0 RC4 Pre-release
Pre-release

Changelog:

  • Fixes for gcc8 compilation
  • Fix missing initialization of rndv_send_nbr thresholds
  • Fix mlx5 srq cleanup
  • Fix ep info print when there is no wireup lane
  • Optimize ugni locking

v1.3.0 RC3

13 Mar 10:07
9287190
Compare
Choose a tag to compare
v1.3.0 RC3 Pre-release
Pre-release
  • Fix compilation issue with mlx5 on ARM
  • Disable GDR-copy when ODP is used

v1.3.0 RC2

25 Feb 09:23
0b45e29
Compare
Choose a tag to compare
v1.3.0 RC2 Pre-release
Pre-release

Bugfixes:

  • Fix flow control for DC transport

1.3.0 - RC1

15 Feb 17:49
822e820
Compare
Choose a tag to compare
1.3.0 - RC1 Pre-release
Pre-release

Features:

  • Added stream-based communication API to UCP
  • Added support for GPU platforms: Nvidia CUDA and AMD ROCM software stacks
  • Added API for client/server based connection establishment
  • Added support for TCP transport
  • Support for InfiniBand tag-matching offload for DC and accelerated transports
  • Multi-rail support for eager and rendezvous protocols
  • Added support for tag-matching communications with CUDA buffers
  • Added ucp_rkey_ptr() to obtain pointer for shared memory region
  • Avoid progress overhead on unused transports
  • Improved scalability of software tag-matching by using a hash table
  • Added transparent huge-pages allocator
  • Added non-blocking flush and disconnect for UCP
  • Support fixed-address memory allocation via ucp_mem_map()
  • Added ucp_tag_send_nbr() API to avoid send request allocation
  • Support global addressing in all IB transports
  • Add support for external epoll fd and edge-triggered events
  • Added registration cache for knem
  • Initial support for Java bindings

Bugfixes:

  • Multiple bugfixes (full list on githib)
    Tested configurations:
  • InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2

Known issues:
#2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
#2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
#1977 - failure in shm/test_ucp_rma.blocking_small/0
#1926 - Timeout in mpi_test_suite with HW TM
#1920 - transport retry count exceeded in many-to-one tests
#1689 - Segmentation fault on memory hooks test in jenkins