Skip to content

Releases: LLNL/RAJA

v0.4.0

11 Oct 15:53
Compare
Choose a tag to compare

Please download the RAJA-0.4.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.

The v0.4.0 release contains minor fixes for issues in the previous v0.3.1 release, plus some improvements to documentation, reduction performance, improved portability across a growing set of compilers and environments (e.g., Windows), namespace refactoring to avoid cyclic dependencies and leverage argument-dependent lookup, etc. In addition, the RAJA backend for Intel TBB is now off by default, whereas previously it was on by default.

A few major changes are included in this version:

  • Changes to the way RAJA is configured and built. We are now using the BLT build system which is a Git submodule of RAJA. In addition to requiring the '--recursive' option to be passed to 'git clone', this introduces the following major change: RAJA_ENABLE_XXX options passed to CMake are now just ENABLE_XXX.

  • A new API and implementation for nested-loop RAJA constructs has been added. It is still a work in progress, but users are welcome to try it out and provide feedback. Eventually, RAJA::nested::forall will replace RAJA::forallN.

v0.3.1

21 Sep 16:30
Compare
Choose a tag to compare

This release contains some new RAJA features, plus a bunch of internal changes including more tests, conversion of nearly all unit tests to use Google Test, improved testing coverage, and compilation portability improvements (e.g., Intel, nvcc, msvc). Also, the prefix for all RAJA source files has been changed from *.cxx to *.cpp for consistency with the header file prefix conversion in the last release. The source file prefix change should not require users to change anything.

New features included in this release:

  • Execution policy modifications and additions: seq_exec is now strictly sequential (no SIMD, etc.), simd_exec will force SIMD vectorization, loop_exec (new policy) will allow compiler to optimize however it can, including SIMD. So, loop_exec is really what our previous simd_exec policy was before, and 'no vector' pragmas have been added to all sequential implementations. NOTE: SIMD changes are still being evaluated with different compilers on different platforms. More information will be provided as we learn more.
  • Added support for atomic operations (min, max, inc, dec, and, or, xor, exchange, and CAS) for all programming model backends. These appear in the RAJA::atomic namespace.
  • Support added for Intel Threading Building Blocks backend (considered experimental at this point)
  • Added macros that will be used to mark features for future deprecation (please watch for this as we will be deprecating some features in the next release)
  • Added support for C++17 if CMake knows about it
  • Remove limit on number of ordered OpenMP reductions that can be used in a kernel
  • Remove compile-time error from memutils, add portable aligned allocator
  • Improved ListSegment implementation
  • RAJA::Index_type is now ptrdiff_t instead of int

Notable bug fixes included in this release:

  • Fixed strided_numeric_iterator to apply stride sign in comparison
  • Bug in RangeStrideSegment when using CUDA is fixed
  • Fixed reducer logic for openmp_ordered policy

v0.3.0

13 Jul 15:02
Compare
Choose a tag to compare

This release contains breaking changes and is not backward compatible with prior versions. The largest change is a re-organization of header files, and the switch to .hpp as a file extension for all headers.

New features included in this release:

  • Re-organization of header files
  • Renaming of file extensions
  • OpenMP 4.5 support
  • CHAI support

v0.2.5

28 Mar 17:31
Compare
Choose a tag to compare

This release includes some small fixes, as well as an initial re-organization of the RAJA header files as we move towards a more flexible usage model.

v0.2.4

22 Feb 17:01
Compare
Choose a tag to compare

This version includes the following changes:

  • Initial support of clang-cuda
  • New, faster OpenMP reductions

N.B The default OpenMP reductions are no longer performed in a ordered fashion, so results may not be reproducible. The old reductions are still available with the policy RAJA::omp_reduce_ordered.

v0.2.3

15 Dec 21:58
Compare
Choose a tag to compare

Hotfix to update the URLs used for fetching clang during Travis builds.

v0.2.2

14 Dec 22:25
Compare
Choose a tag to compare

Bugfix release that address an error when launching forall cuda kernels with a 0-length range

v0.2.1

07 Dec 23:48
Compare
Choose a tag to compare

This release contains fixes for compiler warnings and removes the usage of the custom FindCUDA CMake package.

v0.2.0

02 Dec 21:50
Compare
Choose a tag to compare

Includes internal changes for performance and code maintenance.

v0.1.0

22 Jun 17:34
Compare
Choose a tag to compare
Merge pull request #67 from LLNL/hotfix/readme-version-number

Hotfix/readme version number