Skip to content

Releases: eth-cscs/DLA-Future

DLA-Future 0.5.0

31 May 12:40
v0.5.0
88f2543
Compare
Choose a tag to compare

Changes

  • Introduced an option (*) for forcing contiguous GPU communication buffers. (#1096)
  • Introduced an option (*) for enabling GPU aware MPI communication. (#1102)
  • Removed special handling of Intel MKL, as it could lead to broken installations. (#1149)
    • Spack installations: spack will set the correct variables.
    • Manual installations: the user is responsible to correctly set variables (see BUILD.md).

(*) These options are available as spack variants.

Performance improvements

  • Don't communicate in algorithms when using single rank communicators. (#1097)
  • Fixed slow performance of local version of bt_band_to_tridiagonal (#1144)

Bug fixes

  • Implemented a workaround for hipMemcpyDefault 2D memcpys, due to bugs in HIP. (#1106)
  • Miniapps initialize HIP before MPI, as on older Cray MPICH versions initializing HIP after MPI leads to HIP not seeing any devices. (#1090)

DLA-Future 0.4.1

10 May 11:11
v0.4.1
ee57bfa
Compare
Choose a tag to compare

Bug fixes

  • Update project version and export it in CMake. (#1121)

DLA-Future 0.4.0

16 Jan 11:42
ea4521d
Compare
Choose a tag to compare

Changes:

  • Modified CommunicatorGrid to avoid blocking calls to MPI_Comm_dup. It now returns communicator pipelines. (#993)
  • Added support for Intel oneMKL and the intel-oneapi-mkl spack package. (#1073) (*)

Performance improvements:

  • Reduced the size of the matrix-matrix multiplications in the tridiagonal eigensolver to cover only the non deflated part of the eigenvectors. (#951 #967 #996 #997 #998)
  • Introduced stackless threads where appropriate. (#1037)

Bug fixes:

  • Use drop_operation_state to avoid stack overflows. (#1004)

Notes:

(*) At the time of the release the spack spec blaspp~openmp ^intel-oneapi-mkl threads=openmp doesn't build. If you rely on multithreaded BLAS we suggest to use blaspp+openmp ^intel-oneapi-mkl threads=openmp until spack/spack#42087 gets merged.

DLA-Future 0.3.1

22 Nov 19:32
Compare
Choose a tag to compare

Bugfix:

  • Fixed compilation with gcc 9.3
  • Fixed compilation with CUDA 11.2
  • Improved eigensolver tests

DLA-Future 0.3.0

13 Nov 17:13
8ac9547
Compare
Choose a tag to compare

Changes:

  • added C and ScaLAPACK API (generalized eigensolver) (#992)
  • removed pika-algorithm dependency (#945)

Performance improvements:

  • Fixed Cholesky priorities (#999)

DLA-Future 0.2.1

20 Sep 12:15
94d1061
Compare
Choose a tag to compare

Bugfix:

  • Fixed a problem in reduction_to_band that could have produced results filled with NaNs for certain corner cases. (E.g. input matrix with all off-band elements set to 0).

DLA-Future 0.2.0

30 Aug 14:41
e52fde5
Compare
Choose a tag to compare

Changes:

  • renamed algorithms using snake case (#942)
  • added C and ScaLAPACK API (cholesky and eigensolver) (#886)
  • Matrix API:
    • initial support for matrices with different tile/block-size (#909)
    • initial support for matrix subpipelines (#898)
    • initial support for submatrices (#934)
    • initial support for matrix redistribution (#933)

Bugfixes:

  • fixed a problem in tridiagonal_eigensolver which produced wrong results for some classes of matrices (#960)

Performance improvements:

  • introduced busy barriers in reduction_to_band (#864)
  • new band_to_tridiagonal algorithm implementation (#938, #946)
  • improved the rank1 problem solution in tridiagonal_eigensolver (#904, #936)

DLA-Future 0.1.0

07 Jun 08:19
0d88dfb
Compare
Choose a tag to compare

The first release of DLA-Future.