Skip to content

OpenBLAS 0.3.22 version

Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 26 Mar 21:45
· 1231 commits to release-0.3.0 since this release
e46971b

This release has now been found to have an inadvertent regression in LU factorization (GETRF/GETF2)
A new release will be made as soon as the fixes currently under testing are confirmed to be sufficient

general:

  • Updated the included LAPACK to Reference-LAPACK release 3.11.0
    plus post-release corrections and improvements
  • Added initial support for processing with the EMSCRIPTEN javascript
    converter (yielding a single-threaded build only)
  • Added a threshold for multithreading in SYMM, SYMV and SYR2K
  • Increased the threshold for multithreading in SYRK
  • OpenBLAS no longer decreases the global OMP_NUM_THREADS when it
    exceeds the maximum thread count the library was compiled for.
  • fixed ?GETF2 potentially returning NaN with tiny matrix elements
  • fixed openblas_set_num_threads to work in USE_OPENMP builds
  • fixed cpu core counting in USE_OPENMP builds returning the number
    of OMP "places" rather than cores
  • fixed interpretation of USE_PERL=0 in build scripts
  • fixed linking of the library with libm in CMAKE builds
  • fixed startup delays resulting from a wrong default setting of
    NO_WARMUP in CMAKE builds
  • fixed inconsistent defaults for overriding of LAPACK SPMV, SPR,
    SYMV, SYR functions in gmake and CMAKE builds
  • fixed stride calculation in the optimized small-matrix path of
    complex SYR
  • fixed compilation of ReLAPACK with CMAKE
  • fixed pkgconfig file contents for INTERFACE64 builds
  • fixed building of Reference-LAPACK with recent gfortran
  • fixed building with only a subset of precision types on Windows
  • added new environment variable OPENBLAS_DEFAULT_NUM_THREADS
  • added a GEMV-based implementation of GEMMT
  • added support for building under QNX
  • updated support for (cross-)building for ALPHA targets

x86_64:

  • added autodetection of Intel Raptor Lake cpu models
  • added SSCAL microkernels for Haswell and newer targets
  • improved the performance of the Haswell DSCAL microkernel
  • added CSCAL and ZSCAL microkernels for SkylakeX targets
  • fixed detection of gfortran and Cray CCE compilers
  • fixed detection of recent versions of the Intel Fortran compiler
  • fixed compilation with LLVM to no longer run out of AVX512 registers
  • fix cpu type option setting with recent NVIDIA HPC compiler versions
  • fixed compilation for/on AMD Ryzen 4 cpus
  • fixed compilation of AVX2-capable targets with Apple Clang
  • fixed runtime selection of COOPERLAKE in DYNAMIC_ARCH builds
  • worked around gcc/llvm using risky FMA operations in CSCAL/ZSCAL
  • worked around miscompilations of GEMV, SYMV and ZDOT kernels
    by gcc12's tree-vectorizer on OSX and Windows

ARM:

  • fixed cross-compilation to ARMV5 and ARMV6 targets with CMAKE

ARMV8:

  • fixed cross-compilation to CortexA53 with CMAKE
  • fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
  • added cpu autodetection for Cortex X3 and A715
  • fixed conditional compilation of SVE-capable targets in DYNAMIC_ARCH
  • sped up SVE kernels by removing unnecessary prefetches
  • improved the GEMM performance of Neoverse V1
  • added SVE kernels for SDOT and DDOT
  • added an SBGEMM kernel for Neoverse N2
  • improved cpu-specific compiler option selection for Neoverse cpus
  • added support for setting CONSISTENT_FPCSR

MIPS64:

  • improved MSA capability detection and handling
  • added a MIPS64_GENERIC build target
  • fixed corner cases in DNRM2

LOONGARCH64:

  • fixed handling of the INTERFACE64 option

RISCV:

  • fixed handling of the INTERFACE64 option

md5sums:
354e552c15d1ce93fc95cf1e3b181ddc OpenBLAS-0.3.22.tar.gz
c4de94c48a6ddb8ac3036763269aaf27 OpenBLAS-0.3.22.zip
4a5ee2693546ffd03d3a60829f3c6054 OpenBLAS-0.3.22-x64.zip
e1008c13d26caea6f0398ea7d8ce2f8f OpenBLAS-0.3.22-x86.zip
Download OpenBLAS