ROCm 6.1.0 Release #3061
samjwu
announced in
Announcements
Replies: 1 comment
-
When will be the native support for Windows? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
ROCm 6.1 release highlights
The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:
Full support for Ubuntu 22.04.4.
rocDecode, a new ROCm component that provides high-performance video decode support for
AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
YUV frames in video memory. With decoded frames in video memory, you can run video
post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.
To learn more, refer to the rocDecode
documentation.
OS and GPU support changes
ROCm 6.1 adds the following operating system support:
Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.
Installation packages
This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH
). Package names have the suffixrpath
; for example, therpath
variant ofrocminfo
isrocminfo-rpath
.ROCm components
The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.
AMD System Management Interface (SMI) Tool
New monitor command for GPU metrics.
Use the monitor command to customize, capture, collect, and observe GPU metrics on
target devices.
Integration with E-SMI.
The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
user space software APIs to monitor and control your CPU’s power, energy, performance, and other
system management functionality. This integration enables access to CPU metrics and telemetry
through the AMD SMI API and CLI tools.
Composable Kernel (CK)
New architecture support.
CK now supports to the following architectures to enable efficient image denoising on the following
AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
gfx1036
FP8 rounding logic is replaced with stochastic rounding.
Stochastic rounding mimics a more realistic data behavior and improves model convergence.
HIP
The default
HIP_LAUNCH_BLOCKING
value is0
(disable); which causes kernels to run as defined inthe queue. When set to
1
(enable), the HIP runtime serializes the kernel queue, which behaves thesame as
AMD_SERIALIZE_KERNEL
.hipBLASLt
performance tuning.
hipFFT
implementation is a functional preview.
HIPIFY
--default-preprocessor
option. To hipify everything, despite conditional preprocessor directives(
#if
,#ifdef
,#ifndef
,#elif
, or#else
), don't use the--default-preprocessor
or--amap
options.hipSPARSELt
Structured sparsity matrices help speed up deep-learning workloads. We now support
B
as thesparse matrix and
A
as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to thisrelease, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
sparsity matrices help speed up deep learning workloads.
hipTensor
You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
Complex F32/F64 datatypes.
MIGraphX
Improved performance for transformer-based models.
We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.
New Torch-MIGraphX driver.
This driver calls MIGraphX directly from PyTorch. It provides an
mgx_module
object that you caninvoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.
can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or
migraphx-driver
.You can quantize a floating point model to FP8 format by using the
--fp8
flag withmigraphx-driver
.To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
support in various backend kernel libraries.
MIOpen
Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
in a format where the height and width dimensions come first, followed by channels.
OpenMP
Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
Implicit Zero-copy behavior in
non unified_shared_memory
programs is triggered automatically inXNACK-enabled MI300A systems (for example, when using the
HSA_XNACK=1
environmentvariable). OpenMP supports the 'requires
unified_shared_memory
' directive to support programsthat don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
these directives to every translation unit of the program.
New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).
RCCL
NCCL 2.18.6 compatibility.
RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
per node.
Doubled simultaneous communication channels.
We improved MI300X performance by increasing the maximum number of simultaneous
communication channels from 32 to 64.
rocALUTION
Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
and GPUs. For more information, refer to the
API documentation.
rocDecode
rocDecode ROCm's newest component, providing high-performance video decode support for AMD
GPUs. To learn more, refer to the
documentation.
ROCm Compiler
Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
the
llvm-project/amd
subdirectory of AMD's fork of the LLVM project. Previously, these projectswere maintained in separate repositories. Note that the projects themselves will continue to be
packaged separately.
Split the 'rocm-llvm' package. This package has been split into a required and an optional package:
rocm-llvm(required): A package containing the essential binaries needed for compilation.
rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.
ROCm Data Center Tool (RDC)
RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.
ROCm Performance Primitives (RPP)
Audio processing support added for the
HOST
backend and 3D Voxel kernels supportfor the
HOST
andHIP
backends.ROCm Validation Suite
Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.
rocSOLVER
Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.
ROCTracer
Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.
Upcoming changes
ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
prevent future workflow disruptions.
hipCC supports, by default, the following compiler invocation flags:
-mllvm -amdgpu-early-inline-all=true
-mllvm -amdgpu-function-calls=false
In a future ROCm release, hipCC will no longer support these flags. It will, instead, use the Clang
defaults:
-mllvm -amdgpu-early-inline-all=false
-mllvm -amdgpu-function-calls=true
To evaluate the impact of this change, include
--hipcc-func-supp
in your hipCC invocation.For information on these flags, and the differences between hipCC and Clang, refer to
ROCm Compiler Interfaces.
Future ROCm releases will not provide
clang-ocl
. For more information, refer to theclang-ocl
README.The following operating systems will be supported in a future ROCm release. They are currently
only available in beta.
As of ROCm 6.2, we’ve planned for end-of-support for:
This discussion is based on the release ROCm 6.1.0 Release.
Beta Was this translation helpful? Give feedback.
All reactions