Skip to content

Releases: ROCm/ROCm

ROCm 6.1.0 Release

16 Apr 22:03
4970c5d
Compare
Choose a tag to compare

ROCm 6.1 release highlights

The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:

  • Full support for Ubuntu 22.04.4.

  • rocDecode, a new ROCm component that provides high-performance video decode support for
    AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
    YUV frames in video memory. With decoded frames in video memory, you can run video
    post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.

    To learn more, refer to the rocDecode
    documentation.

OS and GPU support changes

ROCm 6.1 adds the following operating system support:

  • MI300A: Ubuntu 22.04.4 and RHEL 9.3
  • MI300X: Ubuntu 22.04.4

Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.

To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).

Installation packages

This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH). Package names have the suffix rpath; for example, the rpath variant of rocminfo is
rocminfo-rpath.

The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.

ROCm components

The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.

AMD System Management Interface (SMI) Tool

  • New monitor command for GPU metrics.
    Use the monitor command to customize, capture, collect, and observe GPU metrics on
    target devices.

  • Integration with E-SMI.
    The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
    user space software APIs to monitor and control your CPU’s power, energy, performance, and other
    system management functionality. This integration enables access to CPU metrics and telemetry
    through the AMD SMI API and CLI tools.

Composable Kernel (CK)

  • New architecture support.
    CK now supports to the following architectures to enable efficient image denoising on the following
    AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
    gfx1036

  • FP8 rounding logic is replaced with stochastic rounding.
    Stochastic rounding mimics a more realistic data behavior and improves model convergence.

HIP

  • New environment variable to enable kernel run serialization.
    The default HIP_LAUNCH_BLOCKING value is 0 (disable); which causes kernels to run as defined in
    the queue. When set to 1 (enable), the HIP runtime serializes the kernel queue, which behaves the
    same as AMD_SERIALIZE_KERNEL.

hipBLASLt

  • New GemmTuning extension parameter GemmTuning allows you to set a split-k value for each solution, which is more feasible for
    performance tuning.

hipFFT

  • New multi-GPU support for single-process transforms Multiple GPUs can be used to perform a transform in a single process. Note that this initial
    implementation is a functional preview.

HIPIFY

  • Skipped code blocks: Code blocks that are skipped by the preprocessor are no longer hipified under the
    --default-preprocessor option. To hipify everything, despite conditional preprocessor directives
    (#if, #ifdef, #ifndef, #elif, or #else), don't use the --default-preprocessor or --amap options.

hipSPARSELt

  • Structured sparsity matrix support extensions
    Structured sparsity matrices help speed up deep-learning workloads. We now support B as the
    sparse matrix and A as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
    release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
    sparsity matrices help speed up deep learning workloads.

hipTensor

  • 4D tensor permutation and contraction support.
    You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
    Complex F32/F64 datatypes.

MIGraphX

  • Improved performance for transformer-based models.
    We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.

  • New Torch-MIGraphX driver.
    This driver calls MIGraphX directly from PyTorch. It provides an mgx_module object that you can
    invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
    Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.

    • FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
      can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or migraphx-driver.
      You can quantize a floating point model to FP8 format by using the --fp8 flag with migraphx-driver.
      To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
      support in various backend kernel libraries.

MIOpen

  • Improved performance for inference and convolutions.
    Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
    samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
    in a format where the height and width dimensions come first, followed by channels.

OpenMP

  • Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
    Implicit Zero-copy behavior in non unified_shared_memory programs is triggered automatically in
    XNACK-enabled MI300A systems (for example, when using the HSA_XNACK=1 environment
    variable). OpenMP supports the 'requires unified_shared_memory' directive to support programs
    that don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
    these directives to every translation unit of the program.

  • New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).

RCCL

  • NCCL 2.18.6 compatibility.
    RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
    per node.

  • Doubled simultaneous communication channels.
    We improved MI300X performance by increasing the maximum number of simultaneous
    communication channels from 32 to 64.

rocALUTION

  • New multiple node and GPU support.
    Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
    and GPUs. For more information, refer to the
    API documentation.

rocDecode

  • New ROCm component.
    rocDecode ROCm's newest component, providing high-performance video decode support for AMD
    GPUs. To learn more, refer to the
    documentation.

ROCm Compiler

  • Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
    the llvm-project/amd subdirectory of AMD's fork of the LLVM project. Previously, these projects
    were maintained in separate repositories. Note that the projects themselves will continue to be
    packaged separately.

  • Split the 'rocm-llvm' package. This package has been split into a required and an optional package:

    • rocm-llvm(required): A package containing the essential binaries needed for compilation.

    • rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.

ROCm Data Center Tool (RDC)

  • C++ upgrades.
    RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.

ROCm Performance Primitives (RPP)

  • New backend support.
    Audio processing support added for the HOST backend and 3D Voxel kernels support
    for the HOST and HIP backends.

ROCm Validation Suite

  • New datatype support.
    Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.

rocSOLVER

  • New EigenSolver routine.
    Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.

ROCTracer

  • New versioning and callback enhancements.
    Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.

Upcoming changes

  • ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
    prevent future workflow disruptions.

  • hipCC supports, by default, the following compiler invocation flags:

    • -mllvm -amdgpu-early-inline-all=true
    • -mllvm -amdgpu-function-calls=false

    ...

Read more

ROCm 6.0.2 Release

31 Jan 23:29
Compare
Choose a tag to compare

ROCm 6.0.2 is a point release with minor bug fixes to improve stability of MI300 GPU applications. This included fixes in the rocSPARSE library. Several new driver features are introduced for system qualification on our partner server offerings.

hipFFT

Changes

  • Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
    over (this should help simplify downstream builds and packaging)

ROCm 6.0.0 Release

15 Dec 21:47
1828271
Compare
Choose a tag to compare

Release notes for AMD ROCm™ 6.0

ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
MI300 series. Future releases will further enable and optimize this new platform. Key features include:

  • Improved performance in areas like lower precision math and attention layers.
  • New hipSPARSELt library accelerates AI workloads via AMD's sparse matrix core technique.
  • Upstream support is now available for popular AI frameworks like TensorFlow, JAX, and PyTorch.
  • New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
  • Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
    tutorials on the AMD ROCm Docs site.
  • Consolidated developer resources and training on the new
    AMD ROCm Developer Hub.

The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
the Changelog. We list known
issues on GitHub.

OS and GPU support changes

ROCm 6.0 enables the use of MI300A and MI300X Accelerators with a limited operating systems
support. Future releases will add additional OS's to match our general offering.

Operating Systems MI300A MI300X
Ubuntu 22.04.5 Supported Supported
RHEL 8.9 Supported
SLES15 SP5 Supported

For older generations of supported Instinct products we've added the following operating systems:

  • RHEL 9.3
  • RHEL 8.9

Note: For ROCm 6.2 and beyond, we've planned for end-of-support (EoS) for the following operating
systems:

  • Ubuntu 20.04.5
  • SLES 15 SP4
  • RHEL/CentOS 7.9

New ROCm meta package

We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
libraries. For example, the following command will install the full ROCm package: apt-get install rocm
(Ubuntu), or yum install rocm (RHEL).

Filesystem Hierarchy Standard

ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
the backward compatibility support for old file locations.

Compiler location change

  • The installation path of LLVM has been changed from /opt/rocm-<rel>/llvm to
    /opt/rocm-<rel>/lib/llvm. For backward compatibility, a symbolic link is provided to the old
    location and will be removed in a future release.
  • The installation path of the device library bitcode has changed from /opt/rocm-<rel>/amdgcn to
    /opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn. For backward compatibility, a symbolic link
    is provided and will be removed in a future release.

Documentation

CMake support has been added for documentation in the
ROCm repository.

AMD Instinct™ MI50 end-of-support notice

AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
maintenance mode in ROCm 6.0.

As outlined in 5.6.0, ROCm 5.7 was the
final release for gfx906 GPUs in a fully supported state.

  • Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
  • Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
    2024 (end of maintenance [EOM] will be aligned with the closest ROCm release).
  • Bug fixes will be made up to the next ROCm point release.
  • Bug fixes will not be backported to older ROCm releases for gfx906.
  • Distribution and operating system updates will continue per the ROCm release cadence for gfx906
    GPUs until EOM.

ROCm projects

The following sections contains project-specific release notes for ROCm 6.0. For additional details, you
can refer to the Changelog.

AMD SMI

  • Integrated the E-SMI (EPYC-SMI) library.
    You can now query CPU-related information directly through AMD SMI. Metrics include power,
    energy, performance, and other system details.

  • Added support for gfx942 metrics.
    You can now query MI300 device metrics to get real-time information. Metrics include power,
    temperature, energy, and performance.

HIP

  • New features to improve resource interoperability.

    • For external resource interoperability, we've added new structs and enums.
    • We've added new members to HIP struct hipDeviceProp_t for surfaces, textures, and device
      identifiers.
  • Changes impacting backward compatibility.
    There are several changes impacting backward compatibility: we changed some struct members and
    some enum values, and removed some deprecated flags. For additional information, please refer to
    the Changelog.

hipCUB

  • Additional CUB API support.
    The hipCUB backend is updated to CUB and Thrust 2.1.

HIPIFY

  • Enhanced CUDA2HIP document generation.
    API versions are now listed in the CUDA2HIP documentation. To see if the application binary
    interface (ABI) has changed, refer to the
    C column
    in our API documentation.

  • Hipified rocSPARSE.
    We've implemented support for the direct hipification of additional cuSPARSE APIs into rocSPARSE
    APIs under the --roc option. This covers a major milestone in the roadmap towards complete
    cuSPARSE-to-rocSPARSE hipification.

hipRAND

  • Official release.
    hipRAND is now a standalone project--it's no longer available as a submodule for rocRAND.

hipTensor

  • Added architecture support.
    We've added contraction support for gfx942 architectures, and f32 and f64 data
    types.

  • Upgraded testing infrastructure.
    hipTensor will now support dynamic parameter configuration with input YAML config.

MIGraphX

  • Added TorchMIGraphX.
    We introduced a Dynamo backend for Torch, which allows PyTorch to use MIGraphX directly
    without first requiring a model to be converted to the ONNX model format. With a single line of
    code, PyTorch users can utilize the performance and quantization benefits provided by MIGraphX.

  • Boosted overall performance with rocMLIR.
    We've integrated the rocMLIR library for ROCm-supported RDNA and CDNA GPUs. This
    technology provides MLIR-based convolution and GEMM kernel generation.

  • Added INT8 support across the MIGraphX portfolio.
    We now support the INT8 data type. MIGraphX can perform the quantization or ingest
    prequantized models. INT8 support extends to the MIGraphX execution provider for ONNX Runtime.

ROCgdb

  • Added support for additional GPU architectures.
    • Navi 3 series: gfx1100, gfx1101, and gfx1102.
    • MI300 series: gfx942.

rocm-smi-lib

  • Improved accessibility to GPU partition nodes.
    You can now view, set, and reset the compute and memory partitions. You'll also get notifications of
    a GPU busy state, which helps you avoid partition set or reset failure.

  • Upgraded GPU metrics version 1.4.
    The upgraded GPU metrics binary has an improved metric version format with a content version
    appended to it. You can read each metric within the binary without the full rsmi_gpu_metric_t data
    structure.

  • Updated GPU index sorting.
    We made GPU index sorting consistent with other ROCm software tools by optimizing it to use
    Bus:Device.Function (BDF) instead of the card number.

ROCm Compiler

  • Added kernel argument optimization on gfx942.
    With the new feature, you can preload kernel arguments into Scalar General-Purpose Registers
    (SGPRs) rather than pass them in memory. This feature is enabled with a compiler option, which also
    controls the number of arguments to pass in SGPRs. For more information, see:
    https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments

  • Improved register allocation at -O0.
    We've improved the register allocator used at -O0 to avoid compiler crashes (when the signature is
    'ran out of registers during register allocation').

  • Improved generation of debug information.
    We've improved compile time when generating debug information for certain corner cases. We've
    also improved the compiler to eliminate compiler crashes when generating debug information.

ROCmValidationSuite

  • Added GPU and operating system support.
    We added support for MI300X GPU in GPU Stress Test (GST).

Roc Profiler

  • Added option to specify desired Roc Profiler version.
    You can now use rocProfV1 or rocProfV2 by specifying your desired version, as the legacy rocProf
    (rocprofv1) provides the option to use the latest version (rocprofv2).

  • Automated the ISA dumping process by Advance Thread Tracer.
    Advance Thread Tracer (ATT) no longer depends on user-supplied Instruction Set Architecture (ISA)
    and compilation process (using hipcc --save-temps) to dump ISA from the running kernels.

  • Added ATT support for parallel kernels.
    The automatic ISA dumping process also helps ATT successfully parse multiple kernels running in
    parallel, and provide cycle-accurate occupancy information for multiple kernels at the same time.

ROCr

  • Support for SDMA link aggregation.
    If multiple XGMI links are available when making SDMA copies between GPUs, the copy is
    distributed over multiple links to increase peak bandwi...
Read more

ROCm 5.7.1 Release

13 Oct 23:16
365b317
Compare
Choose a tag to compare

ROCm 5.7.1 is point release with the following changes:

rocBLAS

A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.

rocblas-gemm-tune is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.

For more information on rocBLAS logging, see Logging in rocBLAS.

An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.

If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.

For more details, refer to the rocBLAS Programmer's Guide.

HIP 5.7.1 (for ROCm 5.7.1)

ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.

Fixed

  • The hipPointerGetAttributes API returns the correct HIP memory type as hipMemoryTypeManaged for managed memory.

hipSOLVER 1.8.2

hipSOLVER 1.8.2 for ROCm 5.7.1

Fixed

  • Fixed conflicts between the hipsolver-dev and -asan packages by excluding
    hipsolver_module.f90 from the latter

ROCm 5.7.0 Release

16 Sep 00:16
23aa1ee
Compare
Choose a tag to compare

ROCm 5.7.0 includes many new features. Please see the complete release notes New features include: a new library (hipTensor), and optimizations for rocRAND and MIVisionX. Address sanitizer for host and device code (GPU) is now available as a beta. Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major release in the ROCm 5 series. This release is Linux-only.

Important: The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series. Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime API that is not backward compatible, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.

ROCm 5.6.1 Release

29 Aug 23:29
Compare
Choose a tag to compare

Release Highlights

ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime. This is a Linux only release.

HIP 5.6.1

Fixed Defects

  • hipMemcpy device-to-device (intra device) is now asynchronous with respect to the host
  • Enabled xnack+ check in HIP catch2 tests hang when executing tests
  • Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
  • Using hipGraphAddMemFreeNode no longer results in a crash

HIP SDK 5.5 for Windows

27 Jul 20:05
b4d3dde
Compare
Choose a tag to compare

AMD is pleased to announce the availability of the HIP SDK for Windows as part of the ROCm platform. The HIP SDK OS and GPU support page lists the versions of Windows and GPUs validated by AMD. HIP SDK features on Windows are described in detail in our What is ROCm? page and differs from the Linux feature set. Visit Quick Start page to get started. Known issues are tracked on GitHub.

ROCm 5.6.0 Release

29 Jun 01:18
f9aeee3
Compare
Choose a tag to compare

Release Highlights

ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:

  • New documentation portal at https://rocm.docs.amd.com with highlights on our accompanying blog
  • Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
  • OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
  • Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
  • New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.

Please see the complete release notes and our release blog

ROCm 5.5.1 release

24 May 20:30
7719c17
Compare
Choose a tag to compare

Release v5.5.1

ROCm 5.5.0 release

02 May 03:53
Compare
Choose a tag to compare

What's New in This Release

HIP Enhancements

The ROCm 5.5.0 release consists of the following HIP enhancements:

Enhanced Stack Size Limit

In this release, the stack size limit is increased from 16k to 131056 bytes (or 128K - 16).
Applications requiring to update the stack size can use hipDeviceSetLimit API.

hipcc Changes

The following hipcc changes are implemented in this release:

  • hipcc will not implicitly link to libpthread and librt, as they are no longer a link time dependence for HIP programs. Applications that depend on these libraries must explicitly link to them.
  • -use-staticlib and -use-sharedlib options are deprecated.
Future Changes
New HIP APIs in This Release

Note

This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.

Memory Management HIP APIs

The new memory management HIP API is as follows:

  • Sets information on the specified pointer [BETA].

    hipError_t hipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr);
Module Management HIP APIs

The new module management HIP APIs are as follows:

  • Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed to kernelParams, where thread blocks can cooperate and synchronize as they execute.

    hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams);
    
  • Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they execute.

    hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags);
    
HIP Graph Management APIs

The new HIP Graph Management APIs are as follows:

  • Creates a memory allocation node and adds it to a graph [BETA]

    hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams);
  • Return parameters for memory allocation node [BETA]

    hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams);
  • Creates a memory free node and adds it to a graph [BETA]

    hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr);
  • Returns parameters for memory free node [BETA].

    hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr);
  • Write a DOT file describing graph structure [BETA].

    hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags);
  • Copies attributes from source node to destination node [BETA].

    hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst);
  • Enables or disables the specified node in the given graphExec [BETA]

    hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled);
  • Query whether a node in the given graphExec is enabled [BETA]

    hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled);
OpenMP Enhancements

This release consists of the following OpenMP enhancements:

  • Additional support for OMPT functions get_device_time and get_record_type.
  • Add support for min/max fast fp atomics on AMD GPUs.
  • Fix the use of the abs function in C device regions.

Deprecations and Warnings

HIP Deprecation

The hipcc and hipconfig Perl scripts are deprecated. In a future release, compiled binaries will be available as hipcc.bin and hipconfig.bin as replacements for the Perl scripts.

Note

There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to hipcc.bin and hipconfig.bin. The hipcc/hipconfig soft link will be assimilated to point from hipcc/hipconfig to the respective compiled binaries as the default option.

Linux Filesystem Hierarchy Standard for ROCm

ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.

New Filesystem Hierarchy

The following is the new filesystem hierarchy:4

/opt/rocm-<ver>
    | --bin
      | --All externally exposed Binaries
    | --libexec
        | --<component>
            | -- Component specific private non-ISA executables (architecture independent)
    | --include
        | -- <component>
            | --<header files>
    | --lib
        | --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
            (public libraries linked with application)
        | --<component> (component specific private library, executable data)
        | --<cmake>
            | --components
                | --<component>.config.cmake
    | --share
        | --html/<component>/*.html
        | --info/<component>/*.[pdf, md, txt]
        | --man
        | --doc
            | --<component>
                | --<licenses>
        | --<component>
            | --<misc files> (arch independent non-executable)
            | --samples

Note

ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.

For more information, refer to https://refspecs.linuxfoundation.org/fhs.shtml.

Backward Compatibility with Older Filesystems

ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.

Note

ROCm will continue supporting backward compatibility until the next major release.

Wrapper header files

Wrapper header files are placed in the old location (/opt/rocm-xxx/<component>/include) with a warning message to include files from the new location (/opt/rocm-xxx/include) as shown in the example below:

// Code snippet from hip_runtime.h
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
#include "hip/hip_runtime.h"

The wrapper header files’ backward compatibility deprecation is as follows:

  • #pragma message announcing deprecation -- ROCm v5.2 release
  • #pragma message changed to #warning -- Future release
  • #warning changed to #error -- Future release
  • Backward compatibility wrappers removed -- Future release
Library files

Library files are available in the /opt/rocm-xxx/lib folder. For backward compatibility, the old library location (/opt/rocm-xxx/<component>/lib) has a soft link to the library at the new location.

Example:

$ ls -l /opt/rocm/hip/lib/
total 4
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
lrwxrwxrwx 1 root root   24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
CMake Config files

All CMake configuration files are available in the /opt/rocm-xxx/lib/cmake/<component> folder.
For backward compatibility, the old CMake locations (/opt/rocm-xxx/<component>/lib/cmake) consist of a soft link to the new CMake config.

Example:

$ ls -l /opt/rocm/hip/lib/cmake/hip/
total 0
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake

ROCm Support For Code Object V3 Deprecated

Support for Code Object v3 is deprecated and will be removed in a future release.

Comgr V3.0 Changes

The following APIs and macros have been marked as deprecated. These are expected to be removed in a future ROCm release and coincides with the release of Comgr v3.0.

API Changes
  • amd_comgr_action_info_set_options()
  • amd_comgr_action_info_get_options()
Actions and Data Types
  • AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
  • AMD_COMGR_ACTION_COMPILE_SOURCE_TO_FATBIN

For replacements, see the AMD_COMGR_ACTION_INFO_GET/SET_OPTION_LIST APIs, and the AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC macros.

Deprecated Environment Variables

The following environment variables are removed in this ROCm release:

  • GPU_MAX_COMMAND_QUEUES
  • GPU_MAX_WORKGROUP_SIZE_2D_X
  • GPU_MAX_WORKGROUP_SIZE_2D_Y
  • GPU_MAX_WORKGROUP_SIZE_3D_X
  • GPU_MAX_WORKGROUP_SIZE_3D_Y
  • `GPU_MAX...
Read more