16 Apr 22:03

samjwu

4970c5d

ROCm 6.1.0 Release Latest

Latest

ROCm 6.1 release highlights

The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:

Full support for Ubuntu 22.04.4.
rocDecode, a new ROCm component that provides high-performance video decode support for
AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
YUV frames in video memory. With decoded frames in video memory, you can run video
post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.

To learn more, refer to the rocDecode
documentation.

OS and GPU support changes

ROCm 6.1 adds the following operating system support:

MI300A: Ubuntu 22.04.4 and RHEL 9.3
MI300X: Ubuntu 22.04.4

Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.

To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).

Installation packages

This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH). Package names have the suffix rpath; for example, the rpath variant of rocminfo is
rocminfo-rpath.

The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.

ROCm components

The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.

AMD System Management Interface (SMI) Tool

New monitor command for GPU metrics.
Use the monitor command to customize, capture, collect, and observe GPU metrics on
target devices.
Integration with E-SMI.
The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
user space software APIs to monitor and control your CPU’s power, energy, performance, and other
system management functionality. This integration enables access to CPU metrics and telemetry
through the AMD SMI API and CLI tools.

Composable Kernel (CK)

New architecture support.
CK now supports to the following architectures to enable efficient image denoising on the following
AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
gfx1036
FP8 rounding logic is replaced with stochastic rounding.
Stochastic rounding mimics a more realistic data behavior and improves model convergence.

HIP

New environment variable to enable kernel run serialization.
The default HIP_LAUNCH_BLOCKING value is 0 (disable); which causes kernels to run as defined in
the queue. When set to 1 (enable), the HIP runtime serializes the kernel queue, which behaves the
same as AMD_SERIALIZE_KERNEL.

hipBLASLt

New GemmTuning extension parameter GemmTuning allows you to set a split-k value for each solution, which is more feasible for
performance tuning.

hipFFT

New multi-GPU support for single-process transforms Multiple GPUs can be used to perform a transform in a single process. Note that this initial
implementation is a functional preview.

HIPIFY

Skipped code blocks: Code blocks that are skipped by the preprocessor are no longer hipified under the
--default-preprocessor option. To hipify everything, despite conditional preprocessor directives
(#if, #ifdef, #ifndef, #elif, or #else), don't use the --default-preprocessor or --amap options.

hipSPARSELt

Structured sparsity matrix support extensions
Structured sparsity matrices help speed up deep-learning workloads. We now support B as the
sparse matrix and A as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
sparsity matrices help speed up deep learning workloads.

hipTensor

4D tensor permutation and contraction support.
You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
Complex F32/F64 datatypes.

MIGraphX

Improved performance for transformer-based models.
We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.
New Torch-MIGraphX driver.
This driver calls MIGraphX directly from PyTorch. It provides an mgx_module object that you can
invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.
- FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
  can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or migraphx-driver.
  You can quantize a floating point model to FP8 format by using the --fp8 flag with migraphx-driver.
  To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
  support in various backend kernel libraries.

MIOpen

Improved performance for inference and convolutions.
Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
in a format where the height and width dimensions come first, followed by channels.

OpenMP

Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
Implicit Zero-copy behavior in non unified_shared_memory programs is triggered automatically in
XNACK-enabled MI300A systems (for example, when using the HSA_XNACK=1 environment
variable). OpenMP supports the 'requires unified_shared_memory' directive to support programs
that don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
these directives to every translation unit of the program.
New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).

RCCL

NCCL 2.18.6 compatibility.
RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
per node.
Doubled simultaneous communication channels.
We improved MI300X performance by increasing the maximum number of simultaneous
communication channels from 32 to 64.

rocALUTION

New multiple node and GPU support.
Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
and GPUs. For more information, refer to the
API documentation.

rocDecode

New ROCm component.
rocDecode ROCm's newest component, providing high-performance video decode support for AMD
GPUs. To learn more, refer to the
documentation.

ROCm Compiler

Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
the llvm-project/amd subdirectory of AMD's fork of the LLVM project. Previously, these projects
were maintained in separate repositories. Note that the projects themselves will continue to be
packaged separately.
Split the 'rocm-llvm' package. This package has been split into a required and an optional package:
- rocm-llvm(required): A package containing the essential binaries needed for compilation.
- rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.

ROCm Data Center Tool (RDC)

C++ upgrades.
RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.

ROCm Performance Primitives (RPP)

New backend support.
Audio processing support added for the HOST backend and 3D Voxel kernels support
for the HOST and HIP backends.

ROCm Validation Suite

New datatype support.
Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.

rocSOLVER

New EigenSolver routine.
Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.

ROCTracer

New versioning and callback enhancements.
Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.

Upcoming changes

ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
prevent future workflow disruptions.
hipCC supports, by default, the following compiler invocation flags:
- -mllvm -amdgpu-early-inline-all=true
- -mllvm -amdgpu-function-calls=false
...

Assets 2

31 Jan 23:29

samjwu

rocm-6.0.2

43cd749

ROCm 6.0.2 Release

ROCm 6.0.2 is a point release with minor bug fixes to improve stability of MI300 GPU applications. This included fixes in the rocSPARSE library. Several new driver features are introduced for system qualification on our partner server offerings.

hipFFT

Changes

Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
over (this should help simplify downstream builds and packaging)

Assets 2

0 Join discussion

15 Dec 21:47

saadrahim

rocm-6.0.0

1828271

ROCm 6.0.0 Release

Release notes for AMD ROCm™ 6.0

ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
MI300 series. Future releases will further enable and optimize this new platform. Key features include:

Improved performance in areas like lower precision math and attention layers.
New hipSPARSELt library accelerates AI workloads via AMD's sparse matrix core technique.
Upstream support is now available for popular AI frameworks like TensorFlow, JAX, and PyTorch.
New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
tutorials on the AMD ROCm Docs site.
Consolidated developer resources and training on the new
AMD ROCm Developer Hub.

The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
the Changelog. We list known
issues on GitHub.

OS and GPU support changes

ROCm 6.0 enables the use of MI300A and MI300X Accelerators with a limited operating systems
support. Future releases will add additional OS's to match our general offering.

Operating Systems	MI300A	MI300X
Ubuntu 22.04.5	Supported	Supported
RHEL 8.9	Supported
SLES15 SP5	Supported

For older generations of supported Instinct products we've added the following operating systems:

RHEL 9.3
RHEL 8.9

Note: For ROCm 6.2 and beyond, we've planned for end-of-support (EoS) for the following operating
systems:

Ubuntu 20.04.5
SLES 15 SP4
RHEL/CentOS 7.9

New ROCm meta package

We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
libraries. For example, the following command will install the full ROCm package: apt-get install rocm
(Ubuntu), or yum install rocm (RHEL).

Filesystem Hierarchy Standard

ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
the backward compatibility support for old file locations.

Compiler location change

The installation path of LLVM has been changed from /opt/rocm-<rel>/llvm to
/opt/rocm-<rel>/lib/llvm. For backward compatibility, a symbolic link is provided to the old
location and will be removed in a future release.
The installation path of the device library bitcode has changed from /opt/rocm-<rel>/amdgcn to
/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn. For backward compatibility, a symbolic link
is provided and will be removed in a future release.

Documentation

CMake support has been added for documentation in the
ROCm repository.

AMD Instinct™ MI50 end-of-support notice

AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
maintenance mode in ROCm 6.0.

As outlined in 5.6.0, ROCm 5.7 was the
final release for gfx906 GPUs in a fully supported state.

Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
2024 (end of maintenance [EOM] will be aligned with the closest ROCm release).
Bug fixes will be made up to the next ROCm point release.
Bug fixes will not be backported to older ROCm releases for gfx906.
Distribution and operating system updates will continue per the ROCm release cadence for gfx906
GPUs until EOM.

ROCm projects

The following sections contains project-specific release notes for ROCm 6.0. For additional details, you
can refer to the Changelog.

AMD SMI

Integrated the E-SMI (EPYC-SMI) library.
You can now query CPU-related information directly through AMD SMI. Metrics include power,
energy, performance, and other system details.
Added support for gfx942 metrics.
You can now query MI300 device metrics to get real-time information. Metrics include power,
temperature, energy, and performance.

HIP

New features to improve resource interoperability.
- For external resource interoperability, we've added new structs and enums.
- We've added new members to HIP struct hipDeviceProp_t for surfaces, textures, and device
  identifiers.
Changes impacting backward compatibility.
There are several changes impacting backward compatibility: we changed some struct members and
some enum values, and removed some deprecated flags. For additional information, please refer to
the Changelog.

hipCUB

Additional CUB API support.
The hipCUB backend is updated to CUB and Thrust 2.1.

HIPIFY

Enhanced CUDA2HIP document generation.
API versions are now listed in the CUDA2HIP documentation. To see if the application binary
interface (ABI) has changed, refer to the
C column
in our API documentation.
Hipified rocSPARSE.
We've implemented support for the direct hipification of additional cuSPARSE APIs into rocSPARSE
APIs under the --roc option. This covers a major milestone in the roadmap towards complete
cuSPARSE-to-rocSPARSE hipification.

hipRAND

Official release.
hipRAND is now a standalone project--it's no longer available as a submodule for rocRAND.

hipTensor

Added architecture support.
We've added contraction support for gfx942 architectures, and f32 and f64 data
types.
Upgraded testing infrastructure.
hipTensor will now support dynamic parameter configuration with input YAML config.

MIGraphX

Added TorchMIGraphX.
We introduced a Dynamo backend for Torch, which allows PyTorch to use MIGraphX directly
without first requiring a model to be converted to the ONNX model format. With a single line of
code, PyTorch users can utilize the performance and quantization benefits provided by MIGraphX.
Boosted overall performance with rocMLIR.
We've integrated the rocMLIR library for ROCm-supported RDNA and CDNA GPUs. This
technology provides MLIR-based convolution and GEMM kernel generation.
Added INT8 support across the MIGraphX portfolio.
We now support the INT8 data type. MIGraphX can perform the quantization or ingest
prequantized models. INT8 support extends to the MIGraphX execution provider for ONNX Runtime.

ROCgdb

Added support for additional GPU architectures.
- Navi 3 series: gfx1100, gfx1101, and gfx1102.
- MI300 series: gfx942.

rocm-smi-lib

Improved accessibility to GPU partition nodes.
You can now view, set, and reset the compute and memory partitions. You'll also get notifications of
a GPU busy state, which helps you avoid partition set or reset failure.
Upgraded GPU metrics version 1.4.
The upgraded GPU metrics binary has an improved metric version format with a content version
appended to it. You can read each metric within the binary without the full rsmi_gpu_metric_t data
structure.
Updated GPU index sorting.
We made GPU index sorting consistent with other ROCm software tools by optimizing it to use
Bus:Device.Function (BDF) instead of the card number.

ROCm Compiler

Added kernel argument optimization on gfx942.
With the new feature, you can preload kernel arguments into Scalar General-Purpose Registers
(SGPRs) rather than pass them in memory. This feature is enabled with a compiler option, which also
controls the number of arguments to pass in SGPRs. For more information, see:
https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments
Improved register allocation at -O0.
We've improved the register allocator used at -O0 to avoid compiler crashes (when the signature is
'ran out of registers during register allocation').
Improved generation of debug information.
We've improved compile time when generating debug information for certain corner cases. We've
also improved the compiler to eliminate compiler crashes when generating debug information.

ROCmValidationSuite

Added GPU and operating system support.
We added support for MI300X GPU in GPU Stress Test (GST).

Roc Profiler

Added option to specify desired Roc Profiler version.
You can now use rocProfV1 or rocProfV2 by specifying your desired version, as the legacy rocProf
(rocprofv1) provides the option to use the latest version (rocprofv2).
Automated the ISA dumping process by Advance Thread Tracer.
Advance Thread Tracer (ATT) no longer depends on user-supplied Instruction Set Architecture (ISA)
and compilation process (using hipcc --save-temps) to dump ISA from the running kernels.
Added ATT support for parallel kernels.
The automatic ISA dumping process also helps ATT successfully parse multiple kernels running in
parallel, and provide cycle-accurate occupancy information for multiple kernels at the same time.

ROCr

Support for SDMA link aggregation.
If multiple XGMI links are available when making SDMA copies between GPUs, the copy is
distributed over multiple links to increase peak bandwi...

Assets 2

28 Join discussion

13 Oct 23:16

saadrahim

rocm-5.7.1

365b317

ROCm 5.7.1 Release

ROCm 5.7.1 is point release with the following changes:

rocBLAS

A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.

rocblas-gemm-tune is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.

For more information on rocBLAS logging, see Logging in rocBLAS.

An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.

If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.

For more details, refer to the rocBLAS Programmer's Guide.

HIP 5.7.1 (for ROCm 5.7.1)

ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.

Fixed

The hipPointerGetAttributes API returns the correct HIP memory type as hipMemoryTypeManaged for managed memory.

hipSOLVER 1.8.2

hipSOLVER 1.8.2 for ROCm 5.7.1

Fixed

Fixed conflicts between the hipsolver-dev and -asan packages by excluding
hipsolver_module.f90 from the latter

Assets 2

9 Join discussion

16 Sep 00:16

saadrahim

rocm-5.7.0

23aa1ee

ROCm 5.7.0 Release

ROCm 5.7.0 includes many new features. Please see the complete release notes New features include: a new library (hipTensor), and optimizations for rocRAND and MIVisionX. Address sanitizer for host and device code (GPU) is now available as a beta. Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major release in the ROCm 5 series. This release is Linux-only.

Important: The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series. Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime API that is not backward compatible, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.

Assets 2

4 Join discussion

29 Aug 23:29

saadrahim

rocm-5.6.1

f3d3929

ROCm 5.6.1 Release

Release Highlights

ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime. This is a Linux only release.

HIP 5.6.1

Fixed Defects

hipMemcpy device-to-device (intra device) is now asynchronous with respect to the host
Enabled xnack+ check in HIP catch2 tests hang when executing tests
Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
Using hipGraphAddMemFreeNode no longer results in a crash

Assets 2

0 Join discussion

27 Jul 20:05

saadrahim

win-5.5

b4d3dde

HIP SDK 5.5 for Windows

AMD is pleased to announce the availability of the HIP SDK for Windows as part of the ROCm platform. The HIP SDK OS and GPU support page lists the versions of Windows and GPUs validated by AMD. HIP SDK features on Windows are described in detail in our What is ROCm? page and differs from the Linux feature set. Visit Quick Start page to get started. Known issues are tracked on GitHub.

Assets 2

19 Join discussion

29 Jun 01:18

saadrahim

rocm-5.6.0

f9aeee3

ROCm 5.6.0 Release

Release Highlights

ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:

New documentation portal at https://rocm.docs.amd.com with highlights on our accompanying blog
Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.

Please see the complete release notes and our release blog

Assets 2

20 Join discussion

24 May 20:30

zhang2amd

rocm-5.5.1

7719c17

ROCm 5.5.1 release

Release v5.5.1

Assets 2

02 May 03:53

zhang2amd

rocm-5.5.0

41b6d1e

ROCm 5.5.0 release

What's New in This Release

HIP Enhancements

The ROCm 5.5.0 release consists of the following HIP enhancements:

Enhanced Stack Size Limit

In this release, the stack size limit is increased from 16k to 131056 bytes (or 128K - 16).
Applications requiring to update the stack size can use hipDeviceSetLimit API.

`hipcc` Changes

The following hipcc changes are implemented in this release:

hipcc will not implicitly link to libpthread and librt, as they are no longer a link time dependence for HIP programs. Applications that depend on these libraries must explicitly link to them.
-use-staticlib and -use-sharedlib options are deprecated.

Future Changes

Separation of hipcc binaries (Perl scripts) from HIP to hipcc project. Users will access separate hipcc package for installing hipcc binaries in future ROCm releases.
In a future ROCm release, the following samples will be removed from the hip-tests project.
- hipBusbandWidth at https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/shipBusBandwidth
- hipCommander at https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/hipCommander
Note that the samples will continue to be available in previous release branches.

New HIP APIs in This Release

Note

This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.

Memory Management HIP APIs

The new memory management HIP API is as follows:

Sets information on the specified pointer [BETA].

hipError_t hipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr);

Module Management HIP APIs

The new module management HIP APIs are as follows:

Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed to kernelParams, where thread blocks can cooperate and synchronize as they execute.

hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams);

Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they execute.

hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags);

HIP Graph Management APIs

The new HIP Graph Management APIs are as follows:

Creates a memory allocation node and adds it to a graph [BETA]

hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams);

Return parameters for memory allocation node [BETA]

hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams);

Creates a memory free node and adds it to a graph [BETA]

hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr);

Returns parameters for memory free node [BETA].

hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr);

Write a DOT file describing graph structure [BETA].

hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags);

Copies attributes from source node to destination node [BETA].

hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst);

Enables or disables the specified node in the given graphExec [BETA]

hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled);

Query whether a node in the given graphExec is enabled [BETA]

hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled);

OpenMP Enhancements

This release consists of the following OpenMP enhancements:

Additional support for OMPT functions get_device_time and get_record_type.
Add support for min/max fast fp atomics on AMD GPUs.
Fix the use of the abs function in C device regions.

Deprecations and Warnings

HIP Deprecation

The hipcc and hipconfig Perl scripts are deprecated. In a future release, compiled binaries will be available as hipcc.bin and hipconfig.bin as replacements for the Perl scripts.

Note

There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to hipcc.bin and hipconfig.bin. The hipcc/hipconfig soft link will be assimilated to point from hipcc/hipconfig to the respective compiled binaries as the default option.

Linux Filesystem Hierarchy Standard for ROCm

ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.

New Filesystem Hierarchy

The following is the new filesystem hierarchy:4

/opt/rocm-<ver>
    | --bin
      | --All externally exposed Binaries
    | --libexec
        | --<component>
            | -- Component specific private non-ISA executables (architecture independent)
    | --include
        | -- <component>
            | --<header files>
    | --lib
        | --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
            (public libraries linked with application)
        | --<component> (component specific private library, executable data)
        | --<cmake>
            | --components
                | --<component>.config.cmake
    | --share
        | --html/<component>/*.html
        | --info/<component>/*.[pdf, md, txt]
        | --man
        | --doc
            | --<component>
                | --<licenses>
        | --<component>
            | --<misc files> (arch independent non-executable)
            | --samples

Note

ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.

For more information, refer to https://refspecs.linuxfoundation.org/fhs.shtml.

Backward Compatibility with Older Filesystems

ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.

Note

ROCm will continue supporting backward compatibility until the next major release.

Wrapper header files

Wrapper header files are placed in the old location (/opt/rocm-xxx/<component>/include) with a warning message to include files from the new location (/opt/rocm-xxx/include) as shown in the example below:

// Code snippet from hip_runtime.h
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
#include "hip/hip_runtime.h"

The wrapper header files’ backward compatibility deprecation is as follows:

#pragma message announcing deprecation -- ROCm v5.2 release
#pragma message changed to #warning -- Future release
#warning changed to #error -- Future release
Backward compatibility wrappers removed -- Future release

Library files

Library files are available in the /opt/rocm-xxx/lib folder. For backward compatibility, the old library location (/opt/rocm-xxx/<component>/lib) has a soft link to the library at the new location.

Example:

$ ls -l /opt/rocm/hip/lib/
total 4
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
lrwxrwxrwx 1 root root   24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so

CMake Config files

All CMake configuration files are available in the /opt/rocm-xxx/lib/cmake/<component> folder.
For backward compatibility, the old CMake locations (/opt/rocm-xxx/<component>/lib/cmake) consist of a soft link to the new CMake config.

Example:

$ ls -l /opt/rocm/hip/lib/cmake/hip/
total 0
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake

ROCm Support For Code Object V3 Deprecated

Support for Code Object v3 is deprecated and will be removed in a future release.

Comgr V3.0 Changes

The following APIs and macros have been marked as deprecated. These are expected to be removed in a future ROCm release and coincides with the release of Comgr v3.0.

API Changes

amd_comgr_action_info_set_options()
amd_comgr_action_info_get_options()

Actions and Data Types

AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
AMD_COMGR_ACTION_COMPILE_SOURCE_TO_FATBIN

For replacements, see the AMD_COMGR_ACTION_INFO_GET/SET_OPTION_LIST APIs, and the AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC macros.

Deprecated Environment Variables

The following environment variables are removed in this ROCm release:

GPU_MAX_COMMAND_QUEUES
GPU_MAX_WORKGROUP_SIZE_2D_X
GPU_MAX_WORKGROUP_SIZE_2D_Y
GPU_MAX_WORKGROUP_SIZE_3D_X
GPU_MAX_WORKGROUP_SIZE_3D_Y
`GPU_MAX...

Contributors

Maetveis

Assets 2

Releases: ROCm/ROCm

ROCm 6.1.0 Release

ROCm 6.1 release highlights

OS and GPU support changes

Installation packages

ROCm components

AMD System Management Interface (SMI) Tool

Composable Kernel (CK)

HIP

hipBLASLt

hipFFT

HIPIFY

hipSPARSELt

hipTensor

MIGraphX

MIOpen

OpenMP

RCCL

rocALUTION

rocDecode

ROCm Compiler

ROCm Data Center Tool (RDC)

ROCm Performance Primitives (RPP)

ROCm Validation Suite

rocSOLVER

ROCTracer

Upcoming changes

ROCm 6.0.2 Release

hipFFT

Changes

ROCm 6.0.0 Release

Release notes for AMD ROCm™ 6.0

OS and GPU support changes

New ROCm meta package

Filesystem Hierarchy Standard

Compiler location change

Documentation

AMD Instinct™ MI50 end-of-support notice

ROCm projects

AMD SMI

HIP

hipCUB

HIPIFY

hipRAND

hipTensor

MIGraphX

ROCgdb

rocm-smi-lib

ROCm Compiler

ROCmValidationSuite

Roc Profiler

ROCr

ROCm 5.7.1 Release

rocBLAS

HIP 5.7.1 (for ROCm 5.7.1)

Fixed

hipSOLVER 1.8.2

Fixed

ROCm 5.7.0 Release

ROCm 5.6.1 Release

Release Highlights

HIP 5.6.1

Fixed Defects

HIP SDK 5.5 for Windows

ROCm 5.6.0 Release

Release Highlights

ROCm 5.5.1 release

ROCm 5.5.0 release

What's New in This Release

HIP Enhancements

Enhanced Stack Size Limit

hipcc Changes

Future Changes

New HIP APIs in This Release

Memory Management HIP APIs

Module Management HIP APIs

HIP Graph Management APIs

OpenMP Enhancements

Deprecations and Warnings

HIP Deprecation

`hipcc` Changes