Skip to content

Releases: LLNL/RAJA

v2024.02.2

08 May 17:57
593f756
Compare
Choose a tag to compare

This release contains a bugfix and new execution policies that improve performance for GPU kernels with reductions.

Please download the RAJA-v2024.02.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • RAJA::loop_exec and associated policies (loop_reduce, etc.) have been removed. These were deprecated in an earlier release and type aliased to RAJA::seq_exec, etc. which have the same behavior as RAJA::loop_exec, etc. in the past. When you update to this version of RAJA, please change use of loop_exec too seq_exec in your code.
    • New GPU execution policies for CUDA and HIP added which provide improved performance for GPU kernels with reductions. Please see the RAJA User Guide for more information. Short summary:
      • Option added to change max grid size in policies that use the occupancy calculator.
      • Policies added to run with max occupancy, a fraction of of the max occupancy, and to run with a "concretizer" which allows a user to determine how to run based on what the occupancy calculator determines about a kernel.
      • Additional options to tune kernels containing reductions, such as
        • an option to initialize data on host for reductions that use atomic operations
        • an option to avoid device scope memory fences
    • Change ordering of SYCL thread index ordering in RAJA::launch to follow the SYCL "row-major" convention. Please see RAJA User Guide for more information.
  • Build changes/improvements:

    • NONE.
  • Bug fixes/improvements:

    • Fixed issue in bump-style allocator used internally in RAJA::launch.

v2024.02.1

03 Apr 16:47
3ada095
Compare
Choose a tag to compare

This release contains submodule updates and minor RAJA improvements.

Please download the RAJA-v2024.02.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • NONE.
  • Build changes/improvements:

    • Update BLT submodule to v0.6.2 release.
    • Update camp submodule to v2024.02.1 release.
  • Bug fixes/improvements:

    • Various changes to quiet compiler warnings in SYCL builds related to deprecated usage.

v2024.02.0

14 Feb 20:43
82d1b92
Compare
Choose a tag to compare

This release contains several RAJA improvements and submodule updates.

Please download the RAJA-v2024.02.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • BREAKING CHANGE (ALMOST): The loop_exec and associated policies such as loop_atomic, loop_reduce, etc. were deprecated in the v2023.06.0 release (please see the release notes for that version for details). Users should replace these with seq_exec and associated policies for sequential CPU execution. The code behavior will be identical to what you observed with loop_exec, etc. However, due to a request from some users with special circumstances, the loop_* policies still exist in this release as type aliases to their seq_* analogues. The loop_* policies will be removed in a future release.
    • BREAKING CHANGE: RAJA TBB back-end support has been removed. It was not feature complete and the TBB API has changed so that the code no longer compiles with newer Intel compilers. Since we know of no project that depends on it, we have removed it.
    • An IndexLayout concept was added, which allows for accessing elements of a RAJA View via a collection of indicies and use a different indexing strategy along different dimensions of a multi-dimensional View. Please the RAJA User Guide for more information.
    • Add support for SYCL reductions using the new RAJA reduction API.
    • Add support for new reduction API for all back-ends in RAJA::launch.
  • Build changes/improvements:

    • Update BLT submodule to v0.6.1 and incorporate its new macros for managing TPL targets in CMake.
    • Update camp submodule to v2024.02.0, which contains changes to support ROCm 6.x compilers.
    • Update desul submodule to afbd448.
    • Replace internal use of HIP and CUDA platform macros to their newer versions to support latest compilers.
  • Bug fixes/improvements:

    • Change internal memory allocation for HIP to use coarse-grained pinned memory, which improves performance because it can be cached on a device.
    • Fix compilation error resulting from incorrect namespacing of OpenMP execution policy.
    • Several fixes to internal implementation of Reducers and Operators.

v2023.06.1

15 Aug 15:38
9b5f61e
Compare
Choose a tag to compare

This release contains various smallish RAJA improvements.

Please download the RAJA-v2023.06.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • Add compile time block size optimization for new reduction interface.
    • Changed default stream usage in Workgroup constructs to use the stream associated with the default (camp) resource. Previously, RAJA used stream zero. Specifically, this change affects where memset memory is zeroed in the device memory pool and where we get device function pointers for WorkGroup.
  • Build changes/improvements:

    • RAJA_ENABLE_OPENMP_TASK CMake option added to enable/disable algorithm options based on OpenMP task construct. Currently, this only applies to RAJA's OpenMP sort implementation. The default is 'Off'. The option allows users to choose a task implementation if they wish.
  • Bug fixes/improvements:

    • Fix compilation of GPU occupancy calculator and use common types for HIP and CUDA backends in the occupancy calculator, kernel policies, and kernel launch helper routines.
    • Fix direct cudaMalloc/hipMalloc calls and memory leaks.

v2023.06.0

06 Jul 18:15
e330b25
Compare
Choose a tag to compare

This release contains new features to improve GPU kernel performance and some bug fixes. It contains one breaking change described below and an execution policy deprecation also described below. The policy deprecation is not a breaking change in this release, but will result in a breaking change in the next release.

Please download the RAJA-v2023.06.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • In this release the loop_exec execution policy is deprecated and will be removed in the next release. RAJA has had two sequential execution policies for some time, seq_exec and loop_exec. When using the seq_exec execution policy, RAJA would attach #pragma novector, or similar depending on the compiler, to force strictly sequential execution of a loop; e.g., by preventing a compiler from vectorizing a loop, even if it was correct to do so. When the loop_exec policy was specified, the compiler was allowed to apply any optimizations, including SIMD, that its heuristics determined were appropriate. In this release, seq_exec behaves the same as how loop_exec behaves historically and the loop_exec and associated policies, such as loop_atomic, loop_reduce, etc. are type aliases to the analogous seq_exec policies. This prevents breaking user code with this release. However, users should prepare to switch loop_exec policies to the seq_exec policy variants in the future.
    • GPU global (thread and block) indexing has been refactored to abstract indexing in a given dimension. The result is that users can now specify a block size or a grid size at compile time or get those values at run time. You can also ignore blocks and index only with threads and vice versa. Kernel and launch policies are now shared. Such policies are now multi-part and contain global indexing information, a way to map global indices like direct or strided loops, and have a synchronization requirement. The synchronization allows one to request that all threads complete even if some have no work so you can synchronize a block. Aliases have been added for all of the preexisting policies and some are deprecated in favor of policies named more consistently. One BREAKING CHANGE is that thread loop policies are no longer safe to block synchronize. That feature still exists but can only be accessed with a custom policy. The RAJA User Guide contains descriptions of the new policy mechanics.
  • Build changes/improvements:

    • Update BLT submodule to v0.5.3
    • Update camp submodule to v2023.06.0
  • Bug fixes/improvements:

    • Fixes a Windows build issue due to macro file definition logic in a RAJA header file. Specifically, the macro constant RAJA_COMPILER_MSVC was not getting defined properly when building on a Windows platform using a compiler other than MSVC.
    • Kernels using the RAJA OpenMP target back-end were not properly seg faulting when expected to do so. This has been fixed.
    • Various improvements, compilation and execution, in RAJA SIMD support.
    • Various improvements and additions to RAJA tests to cover more end-user cases.

v2022.10.5

28 Feb 23:11
3774f51
Compare
Choose a tag to compare

This release fixes an issue that was found after the v2022.10.4 release.

  • Fixes CUDA and HIP separable compilation option that was broken before the v2022.10.0 release. For the curious reader, the issue was that resources were constructed and calling CUDA/HIP API routines before either runtime was initialized.

Please download the RAJA-v2022.10.5.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

v2022.10.4

15 Dec 00:38
c2a6b17
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.3 patch release and updates a few other things.

Please download the RAJA-v2022.10.4.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Fixes device alignment bug in workgroups which led to missing symbol errors with the AMD clang compiler.

v2022.10.3

07 Dec 19:54
a83a448
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.3 patch release and updates a few other things.

Please download the RAJA-v2022.10.3.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Update camp submodule to v2022.10.1

  • Update BLT submodule to commit 8c229991 (includes fixes for crayftn + hip)

  • Properly export 'roctx' target when CMake variable RAJA_ENABLE_ROCTX is on.

  • Fix CMake logic for exporting desul targets when desul atomics are enabled.

  • Fix the way we use CMake to find the rocPRIM module to follow CMake best practices.

  • Add missing template parameter pack argument in RAJA::statement::For execution policy construct used in RAJA::kernel implementation for OpenMP target back-end.

  • Change to use compile-time GPU thread block size in RAJA::forall implementation. This improves performance of GPU kernels, especially those using the RAJA HIP back-end.

  • Added RAJA plugin support, including CHAI support, for RAJA::launch.

  • Replaced 'DEVICE' macro with alias to 'device_mem_pool_t' to prevent name conflicts with other libraries.

  • Updated User Guide documentation about CMake variable used to pass compiler flags for OpenMP target back-end. This changed with CMake minimum required version bump in v2022.10.0.

  • Adjust ordering of BLT and camp target inclusion in RAJA CMake usage to fix an issue with projects using external camp vs. RAJA submodule.

v2022.10.2

07 Nov 21:37
54a0aaa
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.1 patch release and updates a few things. Sorry for the churn, folks.

Please download the RAJA-v2022.10.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Update desul submodule to commit e4b65e00.

  • CUDA compute architecture must now be set using the 'CMAKE_CUDA_ARCHITECTURES' CMake variable. For example, by passing '-DCMAKE_CUDA_ARCHITECTURES=70' to CMake for 'sm_70' architecture. Using '-DCUDA_ARCH=sm_*' will not no longer do the right thing. Please see the RAJA User Guide for more information.

  • A linking bug was fixed related to the usage of the new RAJA::KernelName capability.

  • A compilation bug was fixed in the new reduction interface support for OpenMP target offload.

  • An issue was fixed in AVX compiler checking logic for RAJA vectorization intrinsics capabilities.

v2022.10.1

31 Oct 20:33
2176ef1
Compare
Choose a tag to compare

This release updates the RAJA release number in CMake, which was inadvertently
missed in the v2022.10.0 release.

Please download the RAJA-v2022.10.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.