Skip to content

Releases: LLNL/RAJA

v2024.02.1

03 Apr 16:47
3ada095
Compare
Choose a tag to compare

This release contains submodule updates and minor RAJA improvements.

Please download the RAJA-v2024.02.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • NONE.
  • Build changes/improvements:

    • Update BLT submodule to v0.6.2 release.
    • Update camp submodule to v2024.02.1 release.
  • Bug fixes/improvements:

    • Various changes to quiet compiler warnings in SYCL builds related to deprecated usage.

v2024.02.0

14 Feb 20:43
82d1b92
Compare
Choose a tag to compare

This release contains several RAJA improvements and submodule updates.

Please download the RAJA-v2024.02.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • BREAKING CHANGE (ALMOST): The loop_exec and associated policies such as loop_atomic, loop_reduce, etc. were deprecated in the v2023.06.0 release (please see the release notes for that version for details). Users should replace these with seq_exec and associated policies for sequential CPU execution. The code behavior will be identical to what you observed with loop_exec, etc. However, due to a request from some users with special circumstances, the loop_* policies still exist in this release as type aliases to their seq_* analogues. The loop_* policies will be removed in a future release.
    • BREAKING CHANGE: RAJA TBB back-end support has been removed. It was not feature complete and the TBB API has changed so that the code no longer compiles with newer Intel compilers. Since we know of no project that depends on it, we have removed it.
    • An IndexLayout concept was added, which allows for accessing elements of a RAJA View via a collection of indicies and use a different indexing strategy along different dimensions of a multi-dimensional View. Please the RAJA User Guide for more information.
    • Add support for SYCL reductions using the new RAJA reduction API.
    • Add support for new reduction API for all back-ends in RAJA::launch.
  • Build changes/improvements:

    • Update BLT submodule to v0.6.1 and incorporate its new macros for managing TPL targets in CMake.
    • Update camp submodule to v2024.02.0, which contains changes to support ROCm 6.x compilers.
    • Update desul submodule to afbd448.
    • Replace internal use of HIP and CUDA platform macros to their newer versions to support latest compilers.
  • Bug fixes/improvements:

    • Change internal memory allocation for HIP to use coarse-grained pinned memory, which improves performance because it can be cached on a device.
    • Fix compilation error resulting from incorrect namespacing of OpenMP execution policy.
    • Several fixes to internal implementation of Reducers and Operators.

v2023.06.1

15 Aug 15:38
9b5f61e
Compare
Choose a tag to compare

This release contains various smallish RAJA improvements.

Please download the RAJA-v2023.06.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • Add compile time block size optimization for new reduction interface.
    • Changed default stream usage in Workgroup constructs to use the stream associated with the default (camp) resource. Previously, RAJA used stream zero. Specifically, this change affects where memset memory is zeroed in the device memory pool and where we get device function pointers for WorkGroup.
  • Build changes/improvements:

    • RAJA_ENABLE_OPENMP_TASK CMake option added to enable/disable algorithm options based on OpenMP task construct. Currently, this only applies to RAJA's OpenMP sort implementation. The default is 'Off'. The option allows users to choose a task implementation if they wish.
  • Bug fixes/improvements:

    • Fix compilation of GPU occupancy calculator and use common types for HIP and CUDA backends in the occupancy calculator, kernel policies, and kernel launch helper routines.
    • Fix direct cudaMalloc/hipMalloc calls and memory leaks.

v2023.06.0

06 Jul 18:15
e330b25
Compare
Choose a tag to compare

This release contains new features to improve GPU kernel performance and some bug fixes. It contains one breaking change described below and an execution policy deprecation also described below. The policy deprecation is not a breaking change in this release, but will result in a breaking change in the next release.

Please download the RAJA-v2023.06.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • In this release the loop_exec execution policy is deprecated and will be removed in the next release. RAJA has had two sequential execution policies for some time, seq_exec and loop_exec. When using the seq_exec execution policy, RAJA would attach #pragma novector, or similar depending on the compiler, to force strictly sequential execution of a loop; e.g., by preventing a compiler from vectorizing a loop, even if it was correct to do so. When the loop_exec policy was specified, the compiler was allowed to apply any optimizations, including SIMD, that its heuristics determined were appropriate. In this release, seq_exec behaves the same as how loop_exec behaves historically and the loop_exec and associated policies, such as loop_atomic, loop_reduce, etc. are type aliases to the analogous seq_exec policies. This prevents breaking user code with this release. However, users should prepare to switch loop_exec policies to the seq_exec policy variants in the future.
    • GPU global (thread and block) indexing has been refactored to abstract indexing in a given dimension. The result is that users can now specify a block size or a grid size at compile time or get those values at run time. You can also ignore blocks and index only with threads and vice versa. Kernel and launch policies are now shared. Such policies are now multi-part and contain global indexing information, a way to map global indices like direct or strided loops, and have a synchronization requirement. The synchronization allows one to request that all threads complete even if some have no work so you can synchronize a block. Aliases have been added for all of the preexisting policies and some are deprecated in favor of policies named more consistently. One BREAKING CHANGE is that thread loop policies are no longer safe to block synchronize. That feature still exists but can only be accessed with a custom policy. The RAJA User Guide contains descriptions of the new policy mechanics.
  • Build changes/improvements:

    • Update BLT submodule to v0.5.3
    • Update camp submodule to v2023.06.0
  • Bug fixes/improvements:

    • Fixes a Windows build issue due to macro file definition logic in a RAJA header file. Specifically, the macro constant RAJA_COMPILER_MSVC was not getting defined properly when building on a Windows platform using a compiler other than MSVC.
    • Kernels using the RAJA OpenMP target back-end were not properly seg faulting when expected to do so. This has been fixed.
    • Various improvements, compilation and execution, in RAJA SIMD support.
    • Various improvements and additions to RAJA tests to cover more end-user cases.

v2022.10.5

28 Feb 23:11
3774f51
Compare
Choose a tag to compare

This release fixes an issue that was found after the v2022.10.4 release.

  • Fixes CUDA and HIP separable compilation option that was broken before the v2022.10.0 release. For the curious reader, the issue was that resources were constructed and calling CUDA/HIP API routines before either runtime was initialized.

Please download the RAJA-v2022.10.5.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

v2022.10.4

15 Dec 00:38
c2a6b17
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.3 patch release and updates a few other things.

Please download the RAJA-v2022.10.4.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Fixes device alignment bug in workgroups which led to missing symbol errors with the AMD clang compiler.

v2022.10.3

07 Dec 19:54
a83a448
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.3 patch release and updates a few other things.

Please download the RAJA-v2022.10.3.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Update camp submodule to v2022.10.1

  • Update BLT submodule to commit 8c229991 (includes fixes for crayftn + hip)

  • Properly export 'roctx' target when CMake variable RAJA_ENABLE_ROCTX is on.

  • Fix CMake logic for exporting desul targets when desul atomics are enabled.

  • Fix the way we use CMake to find the rocPRIM module to follow CMake best practices.

  • Add missing template parameter pack argument in RAJA::statement::For execution policy construct used in RAJA::kernel implementation for OpenMP target back-end.

  • Change to use compile-time GPU thread block size in RAJA::forall implementation. This improves performance of GPU kernels, especially those using the RAJA HIP back-end.

  • Added RAJA plugin support, including CHAI support, for RAJA::launch.

  • Replaced 'DEVICE' macro with alias to 'device_mem_pool_t' to prevent name conflicts with other libraries.

  • Updated User Guide documentation about CMake variable used to pass compiler flags for OpenMP target back-end. This changed with CMake minimum required version bump in v2022.10.0.

  • Adjust ordering of BLT and camp target inclusion in RAJA CMake usage to fix an issue with projects using external camp vs. RAJA submodule.

v2022.10.2

07 Nov 21:37
54a0aaa
Compare
Choose a tag to compare

This release fixes a few issues that were found after the v2022.10.1 patch release and updates a few things. Sorry for the churn, folks.

Please download the RAJA-v2022.10.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • Update desul submodule to commit e4b65e00.

  • CUDA compute architecture must now be set using the 'CMAKE_CUDA_ARCHITECTURES' CMake variable. For example, by passing '-DCMAKE_CUDA_ARCHITECTURES=70' to CMake for 'sm_70' architecture. Using '-DCUDA_ARCH=sm_*' will not no longer do the right thing. Please see the RAJA User Guide for more information.

  • A linking bug was fixed related to the usage of the new RAJA::KernelName capability.

  • A compilation bug was fixed in the new reduction interface support for OpenMP target offload.

  • An issue was fixed in AVX compiler checking logic for RAJA vectorization intrinsics capabilities.

v2022.10.1

31 Oct 20:33
2176ef1
Compare
Choose a tag to compare

This release updates the RAJA release number in CMake, which was inadvertently
missed in the v2022.10.0 release.

Please download the RAJA-v2022.10.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

v2022.10.0

28 Oct 19:01
5f3282c
Compare
Choose a tag to compare

This release contains new features, bug fixes, and build improvements. Please see the RAJA user guide for more information about items in this release.

Please download the RAJA-v2022.10.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

  • New features / API changes:

    • Introduced new RAJA::forall and reduction interfaces that extend the execution behavior of reduction operations with RAJA::forall. The main difference with the pre-existing reduction interface in RAJA is that reduction variables and operations are passed into the RAJA::forall method and lambda expression instead of using the lambda capture mechanism for reduction objects. This offers flexibility and potential performance advantages when using RAJA reductions as the new interface enables the ability to integrate with programming model back-end reduction machinery directly for OpenMP and SYCL, for example. The interface also enables user-chosen kernel names to be passed to RAJA::forall for performance analysis annotations that are easier to understand. Example codes are included as well as a description of the new interface and comparison with the pre-existing interface in the RAJA User Guide.
    • Added support for run time execution policy selection for RAJA::forall kernels. Users can specify any number of execution policies in their code and then select which to use at run time. There is no discussion of this in the RAJA User Guide yet. However, there are a couple of example codes in files RAJA/examples/dynamic-forall.cpp.
    • The RAJA::launch framework has been moved out of the experimental namespace, into the RAJA:: namespace, which introduces an API change.
    • Add support for all RAJA segment types in the RAJA::launch framework.
    • Add SYCL back-end support for RAJA::launch and dynamic shared memory for all back-ends in RAJA::launch. These changes introduce API changes.
    • Add additional policies to WorkGroup construct that allow for different methods of dispatching work.
    • Add special case implementations to CUDA atomicInc and atomicDec functions to use special hardware support when available. This can result in a significant performance boost.
    • Rework HIP atomic implementations to support more native data types.
    • Added RAJA_UNROLL_COUNT macro which enables users to unroll loops for a fix unroll count.
    • Major User Guide rework:
      • New RAJA tutorial sections, including new exercise source files to work through. Material used in recent RADIUSS/AWS RAJA Tutorial.
      • Cleaned up and expanded RAJA feature sections to be more like a reference guide with links to associated tutorial sections for implementation examples.
      • Improved presentation of build configuration sections.
  • Build changes / improvements:

    • Submodule updates:
      • BLT updated to v0.5.2 release.
      • Camp updated to v2022.10.0 release.
    • The minimum CMake version required has changed. For a HIP build, CMake 3.23 or newer is required. For all other builds CMake 3.20 or newer is required.
    • OpenMP back-end support is now off by default to match behavior of all other RAJA parallel back-end support. To enable OpenMP, users must now run CMake with the -DENABLE_OPENMP=On option.
    • Support OpenMP back-end enablement in a HIP build configuration.
    • RAJA_ENABLE_VECTORIZATION CMake option added to enable/disable new SIMD/SIMT vectorization support. The default is 'On'. The option allows users to disable if they wish.
    • Improvements to build target export mechanics coordinated with camp, BLT, and Spack projects.
    • Improve HIP builds to better support evolving ROCm software stack.
    • Add CMake variable RAJA_ALLOW_INCONSISTENT_OPTIONS and CMake messages to allow users more control when using CMake dependent options. When CMake is run, the code now checks for cases when RAJA_ENABLE_X=On and but ENABLE_X=Off. Previously, this was confusing because X would not be enabled despite the value of the RAJA-specific option.
    • Build system refactoring to make CMake configurations more robust; added test to check for installed CMake config.
    • Added basic support to compile with C++20 standard.
    • Add missing compilation macro guards for HIP and CUDA policies in vectorization support when not using a GPU device.
  • Bug fixes / improvements:

    • Expanded test coverage to catch more cases that users have run into.
    • Various fixes in SIMD/SIMT support for different compilers and versions users have hit recently. Also, changes to internal implementations to improve run time performance for those features.