Skip to content

Releases: OpenMathLib/OpenBLAS

OpenBLAS 0.3.27 version

04 Apr 20:33
ce3f668
Compare
Choose a tag to compare

general:

  • added initial (generic) support for the CSKY architecture
  • capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
    underutilized or idle threads
  • sped up multithreaded POTRF on all platforms
  • added extension openblas_set_num_threads_local() that returns the previous thread count
  • re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
    for too small workloads
  • improved the fallback code used when the precompiled number of threads is exceeded,
    and made it callable multiple times during the lifetime of an instance
  • added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
  • fixed a potential buffer overflow in the interface to the GEMMT kernels
  • fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
  • fixed unwanted case sensitivity of the character parameters in ?TRTRS
  • sped up the OpenMP thread management code
  • fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
  • fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
  • added a testsuite for the BLAS extensions
  • modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
    spurious errors
  • added support for building the benchmark collection with CMAKE
  • added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
    with OpenMP enabled that use clang with gfortran
  • fixed building on systems with ucLibc
  • added support for calling ?NRM2 with a negative increment value on all architectures
  • added support for the LLVM18 version of the flang-new compiler
  • fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
  • Integrated fixes from the Reference-LAPACK project:
    • Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)

x86:

  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed GEMM3M functions failing in CMAKE builds

x86-64:

  • removed all instances of sched_yield() on Linux and BSD
  • fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
  • fixed GEMM3M functions failing in CMAKE builds
  • fixed handling of NaN and Inf arguments in ZSCAL
  • added compiler checks for AVX512BF16 compatibility
  • fixed LLVM compiler options for Sapphire Rapids
  • fixed cpu handling fallbacks for Sapphire Rapids with
    disabled AVX2 in DYNAMIC_ARCH mode
  • fixed extensions SCSUM and DZSUM
  • improved GEMM performance for ZEN targets

arm:

  • fixed handling of NaN and Inf arguments in ZSCAL

arm64:

  • added initial support for the Cortex-A76 cpu
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed default compiler options for gcc (-march and -mtune)
  • added support for ArmCompilerForLinux
  • added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
  • fixed mishandling of the INTERFACE64 option in CMAKE builds
  • corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)
  • added SVE-enabled kernels for CSUM/ZSUM
  • worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M

power:

  • improved performance of SGEMM on POWER8/9/10
  • improved performance of DGEMM on POWER10
  • added support for OpenMP builds with xlc/xlf on AIX
  • improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
  • fixed cpu core counting on AIX
  • added support for building a shared library on AIX

riscv64:

  • added support for the X280 cpu
  • added support for semi-generic RISCV models with vector length 128 or 256
  • added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
  • fixed handling of NaN and Inf arguments in ZSCAL
  • improved cpu model autodetection
  • fixed corner cases in ?AXPBY for C910V
  • fixed handling of zero increments in ?AXPY kernels for C910V

loongarch64:

  • added optimized kernels for ?AMIN and ?AMAX
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed handling of corner cases in ?AXPBY
  • fixed computation of SAMIN and DAMIN in LSX mode
  • fixed computation of ?ROT
  • added optimized SSYMV and DSYMV kernels for LSX and LASX mode
  • added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
  • added optimized CGEMV and ZGEMV kernels

mips:

  • fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed mishandling of the INTERFACE64 option in CMAKE builds

zarch:

  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed calculation of ?SUM on Z13

md5sum
ef71c66ffeb1ab0f306a37de07d2667f OpenBLAS-0.3.27.tar.gz
4b85246b10d61f362fe8b9b45cd145f0 OpenBLAS-0.3.27.zip
317c6c4f93f233d8be8ea0ad6fd7979e OpenBLAS-0.3.27-x64-64.zip
2b8d25e6a01ad4830ecca4e521172b02 OpenBLAS-0.3.27-x64.zip
c59038e5ea36ee431f5cb7f5de8bf9d9 OpenBLAS-0.3.27-x86.zip

Download OpenBLAS

OpenBLAS 0.3.26 version

02 Jan 21:27
6c77e5e
Compare
Choose a tag to compare

general:

  • improved the version of openblas.pc that is created by the CMAKE build
  • fixed a CMAKE-specific build problems on older versions of MacOS
  • worked around linking problems on old versions of MacOS
  • corrected installation location of the lapacke_mangling header in CMAKE builds
  • added type declarations for complex variables to the MSVC-specific parts of the LAPACK header
  • significantly sped up ?GESV for small problem sizes by introducing a lower bound for multithreading
  • imported additions and corrections from the Reference-LAPACK project:
    • added new LAPACK functions for truncated QR with pivoting (Reference-LAPACK PRs 891&941)
    • handle miscalculation of minimum work array size in corner cases (Reference-LAPACK PR 942)
    • fixed use of uninitialized variables in ?GEDMD and improved inline documentation (PR 959)
    • fixed use of uninitialized variables (and consequential failures) in ?BBCSD (PR 967)
    • added tests for the recently introduced Dynamic Mode Decomposition functions (PR 736)
    • fixed several memory leaks in the LAPACK testsuite (PR 953)
    • fixed counting of testsuite results by the Python script (PR 954)

x86-64:

  • fixed computation of CASUM on SkylakeX and newer targets in the special
    case that AVX512 is not supported by the compiler or operating environment
  • fixed potential undefined behaviour in the CASUM/ZASUM kernels for AVX512 targets
  • worked around a problem in the pre-AVX kernel for GEMV
  • sped up the thread management code on MS Windows

arm64:

  • fixed building of the LAPACK testsuite with Xcode 15 on Apple M1 and newer
  • sped up the thread management code on MS Windows
  • sped up SGEMM and DGEMM on Neoverse V1
  • sped up ?DOT on SVE-capable targets
  • reduced the number of targets in DYNAMIC_ARCH builds by eliminating functionally equivalent ones
  • included support for Apple M1 and newer targets in DYNAMIC_ARCH builds

power:

  • improved the SGEMM kernel for POWER10
  • fixed compilation with (very) old versions of gcc
  • fixed detection of old 32bit PPC targets in CMAKE-based builds
  • added autodetection of the POWERPC 7400 subtype
  • fixed CMAKE-based compilation for PPCG4 and PPC970 targets

loongarch64:

  • added and improved optimized kernels for almost all BLAS functions

md5sums:
bd496a1c81769ed19a161c1f8f904ccd OpenBLAS-0.3.26.tar.gz
f2524d2eaa55e9c2bad4d203401d4c7f OpenBLAS-0.3.26.zip
739d5666e46b046425b932fb83ce5571 OpenBLAS-0.3.26-x86.zip
3b573471bbc7639b896d1aab356b7e57 OpenBLAS-0.3.26-x64.zip
7522e53dfb4c8c3207c191e66de59430 OpenBLAS-0.3.26-x64-64.zip
(note that you need to edit the paths in the openblas.pc and OpenBLASConfig.cmake files of the Windows binary packages to reflect
your installation location, if you plan to have OpenBLAS findable via pkgconfig or cmake on your Windows system)

Download OpenBLAS

OpenBLAS 0.3.25 version

12 Nov 21:58
5e1a429
Compare
Choose a tag to compare

general:

  • improved the error message shown on exceeding the maximum thread count
  • improved the code to add supplementary thread buffers in case of overflow
  • fixed a potential division by zero in ?ROTG
  • improved the ?MATCOPY functions to accept zero-sized rows or columns
  • corrected empty prototypes in function declarations
  • cleaned up unused declarations in the f2c-converted versions of the LAPACK sources
  • fixed compilation with the Cray CCE Compiler suite
  • improved link line rewriting to avoid mixed libgomp/libomp builds with clang&gfortran
  • worked around OPENMP builds with LLVM14's libomp hanging on FreeBSD
  • improved the Makefiles to require less option duplication on "make install"
  • imported the following changes from the upcoming release 3.12 of Reference-LAPACK
    • deprecate utility functions ?GELQS and ?GEQRS (LAPACK PR 900)
    • apply rounding up to workspace calculations done in floating point (LAPACK PR 904)
    • avoid overflow in STGEX2/DTGEX2 (LAPACK PR 907)
    • fix accumulation in ?LASSQ (LAPACK PR 909)
    • fix handling of NaN values in ?GECON (LAPACK PR 926)
    • avoid overflow in CBDSQR/ZBDSQR (LAPACK PR 927)
    • fix poor vector orthogonalizations in ?ORBDB5/?UNBDB5 (LAPACK PR 928 & 930)

x86-64:

  • fixed compile-time autodetection of AMD Ryzen3 and Ryzen4 cpus
  • fixed capability-based fallback selection for unknown cpus in DYNAMIC_ARCH
  • added AVX512 optimizations for ?ASUM on Sapphire Rapids and Cooper Lake

ARM64:

  • fixed building on Apple with homebrew gcc
  • fixed building with XCODE 15
  • fixed building on A64FX and Cortex A710/X1/X2
  • increased the default buffer size for recent ARM server cpus

POWER:

  • fixed building with the IBM xlf 16.1.1 compiler
  • fixed building with IBM XL C
  • added support for DYNAMIC_ARCH builds with clang
  • fixed union declaration in the BFLOAT16 test case
  • enable optimizations for the AIX assembler on POWER10

LOONGARCH64:

  • added an optimized SGEMV kernel
  • added an optimized DTRSM kernel

md5sums:
db39b32181b10ec2d1572e81e3dc869c OpenBLAS-0.3.25.zip
48384e324cd1cdcfbdb0d2e16ca55327 OpenBLAS-0.3.25.tar.gz
cc93916bd780a13429b65eb9c05527f2 OpenBLAS-0.3.25-x64.zip
58bb5dfc626d3af86aab7fab409c192d OpenBLAS-0.3.25-x64-64.zip
07a19abeac6c67595ec447315244ccd3 OpenBLAS-0.3.25-x86.zip

Download OpenBLAS

OpenBLAS 0.3.24 version

03 Sep 21:21
9f815cf
Compare
Choose a tag to compare

general:

  • declared the arguments of cblas_xerbla as const (in accordance with the reference implementation
    and others, the previous discrepancy appears to have dated back to GotoBLAS)
  • fixed the implementation of ?GEMMT that was added in 0.3.23
  • made cpu-specific SWITCH_RATIO parameters for GEMM available to DYNAMIC_ARCH builds
  • fixed application of SYMBOLSUFFIX in CMAKE builds
  • fixed missing SSYCONVF function in the shared library
  • fixed parallel build logic used with gmake
  • added support for compilation with LLVM17, in particular its new Fortran compiler
  • added support for CMAKE builds using the NVIDIA HPC compiler
  • fixed INTERFACE64 builds with CMAKE and the f95 Fortran compiler
  • fixed cross-build detection and management in c_check
  • disabled building of the tests with CMAKE when ONLY_CBLAS is defined
  • fixed several issues with the handling of runtime limits on the number of OPENMP threads
  • corrected the error code returned by SGEADD/DGEADD when LDA is too small
  • corrected the error code returned by IMATCOPY when LDB is too small
  • updated ?NRM2 to support negative increment values (as introduced in release 3.10.0
    of the Reference BLAS)
  • updated ?ROTG to use the safe scaling algorithm introduced in release 3.10.0 of the Reference BLAS
  • fixed OpenMP builds with CLANG for the case where libomp is not in a standard location
  • fixed a potential overwrite of unrelated memory during thread initialisation on startup
  • fixed a potential integer overflow in the multithreading threshold for ?SYMM/?SYRK
  • fixed build of the LAPACKE interfaces for the LAPACK 3.11.0 ?TRSYL functions added in 0.3.22
  • fixed installation of .cmake files in concurrent 32 and 64bit builds with CMAKE
  • applied additions and corrections from the development branch of Reference-LAPACK:
    • fixed actual arguments passed to a number of LAPACK functions (from Reference-LAPACK PR 885)
    • fixed workspace query results in LAPACK ?SYTRF/?TRECV3 (from Reference-LAPACK PR 883)
    • fixed derivation of the UPLO parameter in LAPACKE_?larfb (from Reference-LAPACK PR 878)
    • fixed a crash in LAPACK ?GELSDD on NRHS=0 (from Reference-LAPACK PR 876)
    • added new LAPACK utility functions CRSCL and ZRSCL (from Reference-LAPACK PR 839)
    • corrected the order of eigenvalues for 2x2 matrices in ?STEMR (Reference-LAPACK PR 867)
    • removed spurious reference to OpenMP variables outside OpenMP contexts (Reference-LAPACK PR 860)
    • updated file comments on use of LAMBDA variable in LAPACK (Reference-LAPACK PR 852)
    • fixed documentation of LAPACK SLASD0/DLASD0 (Reference-LAPACK PR 855)
    • fixed confusing use of "minor" in LAPACK documentation (Reference-LAPACK PR 849)
    • added new LAPACK functions ?GEDMD for dynamic mode decomposition (Reference-LAPACK PR 736)
    • fixed potential stack overflows in the EIG part of the LAPACK testsuite (Reference-LAPACK PR 854)
    • applied small improvements to the variants of Cholesky and QR functions (Reference-LAPACK PR 847)
    • removed unused variables from LAPACK ?BDSQR (Reference-LAPACK PR 832)
    • fixed a potential crash on allocation failure in LAPACKE SGEESX/DGEESX (Reference-LAPACK PR 836)
    • added a quick return from SLARUV/DLARUV for N < 1 (Reference-LAPACK PR 837)
    • updated function descriptions in LAPACK ?GEGS/?GEGV (Reference-LAPACK PR 831)
    • improved algorithm description in ?GELSY (Reference-LAPACK PR 833)
    • fixed scaling in LAPACK STGSNA/DTGSNA (Reference-LAPACK PR 830)
    • fixed crash in LAPACKE_?geqrt with row-major data (Reference-LAPACK PR 768)
    • added LAPACKE interfaces for C/ZUNHR_COL and S/DORHR_COL (Reference-LAPACK PR 827)
    • added error exit tests for SYSV/SYTD2/GEHD2 to the testsuite (Reference-LAPACK PR 795)
    • fixed typos in LAPACK source and comments (Reference-LAPACK PRs 809,811,812,814,820)
    • adopt refactored ?GEBAL implementation (Reference-LAPACK PR 808)

x86_64:

  • added cpu model autodetection for Intel Alder Lake N
  • added activation of the AMX tile to the Sapphire Rapids SBGEMM kernel
  • worked around miscompilations of GEMV/SYMV kernels by gcc's tree-vectorizer
  • fixed compilation of Cooperlake and Sapphire Rapids kernels with CLANG
  • fixed runtime detection of Cooperlake and Sapphire Rapids in DYNAMIC_ARCH
  • fixed feature-based cputype fallback in DYNAMIC_ARCH
  • added support for building the AVX512 kernels with the NVIDIA HPC compiler
  • corrected ZAXPY result on old pre-AVX hardware for the INCX=0 case
  • fixed a potential use of uninitialized variables in ZTRSM

ARMV8:

  • added cpu model autodetection for Apple M2
  • fixed wrong results of CGEMM/CTRMM/DNRM2 under OSX (use of reserved register)
  • added support for building the SVE kernels with the NVIDIA HPC compiler
  • added support for building the SVE kernels with the Apple Clang compiler
  • fixed compiler option handling for building the SVE kernels with LLVM
  • implemented SWITCH_RATIO parameter for improved GEMM performance on Neoverse
  • activated SVE SGEMM and DGEMM kernels for Neoverse V1
  • improved performance of the SVE CGEMM and ZGEMM kernels on Neoverse V1
  • improved kernel selection for the ARMV8SVE target and added it to DYNAMIC_ARCH
  • fixed runtime check for SVE availability in DYNAMIC_ARCH builds to take OS or
    container restrictions into account
  • fixed a potential use of uninitialized variables in ZTRSM
  • fix a potential misdetection of ARMV8 hardware as 32bit in CMAKE builds

LOONGARCH64:

  • added ABI detection
  • added support for cpu affinity handling
  • fixed compilation with early versions of the Loongson toolchain
  • added an optimized SGEMM kernel for 3A5000
  • added optimized DGEMV kernels for 3A5000
  • improved the performance of the DGEMM kernel for 3A5000

MIPS64:

  • fixed miscompilation of TRMM kernels for the MIPS64_GENERIC target

POWER:

  • fixed compiler warnings in the POWER10 SBGEMM kernel

RISCV:

  • fixed application of the INTERFACE64 option when building with CMAKE
  • fix a potential misdetection of RISCV hardware as 32bit in CMAKE builds
  • fixed IDAMAX and DOT kernels for C910V
  • fixed corner cases in the ROT and SWAP kernels for C910V
  • fixed compilation of the C910V target with recent vendor compilers

md5sum:
9fb0d53bf3559d4dea074fa5d7691d39 OpenBLAS-0.3.24.zip
23599a30e4ce887590957d94896789c8 OpenBLAS-0.3.24.tar.gz
3aba5a264dfb0a545723c648b311ae5a OpenBLAS-0.3.24-x86.zip
fc08fe8c0dc7364da115d0e09b5a134f OpenBLAS-0.3.24-x64.zip

note that the Windows binary packages have been regenerated on September 14 because a problem has been found with the included .lib file (referencing a nonexistent "libopenblas.exp.dll" instead of "libopenblas.dll").
If you downloaded the original zip files, their md5sums were
431ef4c46ccd133935fa40be6e02eb14 OpenBLAS-0.3.24-x86.zip
e53de38d326547d6220296a6cec0d9aa OpenBLAS-0.3.24-x64.zip

Download OpenBLAS

OpenBLAS 0.3.23 version

01 Apr 20:20
394a9fb
Compare
Choose a tag to compare

general:

  • fixed a serious regression in GETRF/GETF2 and ZGETRF/ZGETF2 where
    subnormal but nonzero data elements triggered the singularity flag
  • fixed a long-standing bug in CSPR/ZSPR in single-threaded operation
    for cases where elements of the X vector are real numbers (or
    complex with only the real part zero)
  • fixed gmake builds with the option NO_LAPACK
  • fixed a few instances in the gmake Makefiles where expressly
    setting NO_LAPACK=0 or NO_LAPACKE=0 would have the opposite effect

x86_64:

  • added further CPUID values for Intel Raptor Lake

md5sums
115634b39007de71eb7e75cf7591dfb2 OpenBLAS-0.3.23.tar.gz
6c35babfc01534eb04acba653d378839 OpenBLAS-0.3.23.zip
c28473bb8bba85a92f77e350182abddb OpenBLAS-0.3.23-x86.zip
49f156f42622d251aa440ddcd425787d OpenBLAS-0.3.23-x64.zip

note that the Windows binary packages have been regenerated on September 14 because a problem has been found with the included .lib file (referencing a non-existent "libopenblas.exp.dll" instead of "libopenblas.dll").
If you downloaded the original zip files, their md5sums were
d77c18780b2d8a65c9340a415c125918 OpenBLAS-0.3.23-x86.zip
c428119f8d54de25e341ec1becc32251 OpenBLAS-0.3.23-x64.zip
Download OpenBLAS

OpenBLAS 0.3.22 version

26 Mar 21:45
e46971b
Compare
Choose a tag to compare

This release has now been found to have an inadvertent regression in LU factorization (GETRF/GETF2)
A new release will be made as soon as the fixes currently under testing are confirmed to be sufficient

general:

  • Updated the included LAPACK to Reference-LAPACK release 3.11.0
    plus post-release corrections and improvements
  • Added initial support for processing with the EMSCRIPTEN javascript
    converter (yielding a single-threaded build only)
  • Added a threshold for multithreading in SYMM, SYMV and SYR2K
  • Increased the threshold for multithreading in SYRK
  • OpenBLAS no longer decreases the global OMP_NUM_THREADS when it
    exceeds the maximum thread count the library was compiled for.
  • fixed ?GETF2 potentially returning NaN with tiny matrix elements
  • fixed openblas_set_num_threads to work in USE_OPENMP builds
  • fixed cpu core counting in USE_OPENMP builds returning the number
    of OMP "places" rather than cores
  • fixed interpretation of USE_PERL=0 in build scripts
  • fixed linking of the library with libm in CMAKE builds
  • fixed startup delays resulting from a wrong default setting of
    NO_WARMUP in CMAKE builds
  • fixed inconsistent defaults for overriding of LAPACK SPMV, SPR,
    SYMV, SYR functions in gmake and CMAKE builds
  • fixed stride calculation in the optimized small-matrix path of
    complex SYR
  • fixed compilation of ReLAPACK with CMAKE
  • fixed pkgconfig file contents for INTERFACE64 builds
  • fixed building of Reference-LAPACK with recent gfortran
  • fixed building with only a subset of precision types on Windows
  • added new environment variable OPENBLAS_DEFAULT_NUM_THREADS
  • added a GEMV-based implementation of GEMMT
  • added support for building under QNX
  • updated support for (cross-)building for ALPHA targets

x86_64:

  • added autodetection of Intel Raptor Lake cpu models
  • added SSCAL microkernels for Haswell and newer targets
  • improved the performance of the Haswell DSCAL microkernel
  • added CSCAL and ZSCAL microkernels for SkylakeX targets
  • fixed detection of gfortran and Cray CCE compilers
  • fixed detection of recent versions of the Intel Fortran compiler
  • fixed compilation with LLVM to no longer run out of AVX512 registers
  • fix cpu type option setting with recent NVIDIA HPC compiler versions
  • fixed compilation for/on AMD Ryzen 4 cpus
  • fixed compilation of AVX2-capable targets with Apple Clang
  • fixed runtime selection of COOPERLAKE in DYNAMIC_ARCH builds
  • worked around gcc/llvm using risky FMA operations in CSCAL/ZSCAL
  • worked around miscompilations of GEMV, SYMV and ZDOT kernels
    by gcc12's tree-vectorizer on OSX and Windows

ARM:

  • fixed cross-compilation to ARMV5 and ARMV6 targets with CMAKE

ARMV8:

  • fixed cross-compilation to CortexA53 with CMAKE
  • fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
  • added cpu autodetection for Cortex X3 and A715
  • fixed conditional compilation of SVE-capable targets in DYNAMIC_ARCH
  • sped up SVE kernels by removing unnecessary prefetches
  • improved the GEMM performance of Neoverse V1
  • added SVE kernels for SDOT and DDOT
  • added an SBGEMM kernel for Neoverse N2
  • improved cpu-specific compiler option selection for Neoverse cpus
  • added support for setting CONSISTENT_FPCSR

MIPS64:

  • improved MSA capability detection and handling
  • added a MIPS64_GENERIC build target
  • fixed corner cases in DNRM2

LOONGARCH64:

  • fixed handling of the INTERFACE64 option

RISCV:

  • fixed handling of the INTERFACE64 option

md5sums:
354e552c15d1ce93fc95cf1e3b181ddc OpenBLAS-0.3.22.tar.gz
c4de94c48a6ddb8ac3036763269aaf27 OpenBLAS-0.3.22.zip
4a5ee2693546ffd03d3a60829f3c6054 OpenBLAS-0.3.22-x64.zip
e1008c13d26caea6f0398ea7d8ce2f8f OpenBLAS-0.3.22-x86.zip
Download OpenBLAS

OpenBLAS 0.3.21 version

07 Aug 20:50
b89fb70
Compare
Choose a tag to compare

general:

  • updated the included LAPACK to Reference-LAPACK 3.10.1
  • when no Fortran compiler is available, OpenBLAS builds will now automatically
    build LAPACK from an f2c-converted copy of LAPACK 3.9.0 unless the NO_LAPACK option
    is specified (more recent releases make too heavy use of Fortran90+ features to be easily convertible to C)
  • similarly added C versions of the BLAS and CBLAS tests
  • enabled building of the ReLAPACK GEMMT kernels when ReLAPACK is built
  • function LAPACKE_lsame is now annotated with the GCC attribute "const" to aid static analyzers
  • added USE_TLS to the list of options reported by the openblas_get_config() function
  • added openblas_getaffinity() as a Linux-only convenience function wrapping pthread_getaffinity_np()
  • CMAKE builds now support the BUILD_TESTING keyword (to disable the LAPACK testsuite) of Reference-LAPACK
  • fixed CMAKE builds of the laswp_ncopy and neg_tcopy kernels
  • removed the build system requirements for PERL (while keeping the original perl scripts as backup)
  • handle building and running OpenBLAS on systems that report zero available cpu cores
  • added SYMBOLPREFIX/SYMBOLSUFFIX handling for LAPACK 3.10.0 functions added in 0.3.20
  • fixed linking of the utests on QNX
  • Added support for compilation with the Intel ifx compiler
  • Added support for compilation with the Fujitsu FCC compiler for Fugaku
  • Added support for compilation with the Cray C and Fortran compilers
  • reverted OpenMP threadpool behaviour in the exec_blas call to its state before 0.3.11, that is
    the threadpool will no longer grow or shrink on demand as the overhead for this is too big at least with
    GNU OpenMP. The adaptive behaviour introduced in 0.3.11 can still be requested at runtime by setting
    the environment variable OMP_ADAPTIVE
  • worked around spurious STFSM/CTFSM errors reported by the LAPACK testsuite

x86_64:

  • fixed determination of compiler support for AVX512 and removed the 0.3.19
    workaround for building SKYLAKEX kernels on Sandybridge hardware
  • fixed compilation for the SKYLAKEX target with gcc 6
  • fixed compilation of the CooperLake SBGEMM kernel with LLVM
  • fixed compilation of the SkyLakeX small matrix GEMM kernels with LLVM or ICC
  • fixed compilation of some BFLOAT16 kernels with CMAKE
  • added support for the Zhaoxin/Centaur KH40000 cpu
  • fixed a potential crash in the ZSYMV kernel used for all targets except generic
  • fixed gmake compilation for DYNAMIC_ARCH with a DYNAMIC_LIST including ATOM
  • fixed compilation of LAPACKE with the INTEGER64 option on Windows
  • added support for cross-compiling to individual Intel or AMD targets using CMAKE
    (previously only CORE2 supported, added targets are ATOM, PRESCOTT, NEHALEM, SANDYBRIDGE,
    HASWELL,SKYLAKEX, COOPERLAKE, SAPPHIRERAPIDS, OPTERON, BARCELONA, BULLDOZER, PILEDRIVER,
    STEAMROLLER,EXCAVATOR, ZEN)

SPARC:

  • worked around an overflow error in the DNRM2 kernel

POWER:

  • worked around an overflow error in the POWER6 DNRM2 kernel
  • fixed compilation on PPC440
  • fixed a performance regression in the level1 BLAS on POWER10
  • fixed the POWER10 ZGEMM kernel
  • fixed singlethreaded builds for POWER10
  • fixed compilation of the POWER10 DGEMV kernel with older gcc versions
  • enabled compilation of the BFLOAT16 kernels by default
  • enabled the small matrix kernels by default for DYNAMIC_ARCH builds
  • added a workaround for a miscompilation of the CDOT and ZDOT kernels by GCC 12

RISCV:

  • fixed cpu autodetection logic

ARMV8:

  • added an SBGEMM kernel for Neoverse N2
  • worked around an overflow error in the DNRM2 kernel used on M1, NeoverseN1, ThunderX2T99
  • added support for ARM64 systems running MS Windows
  • added support for cross-compiling to the GENERIC ARMV8 target under CMAKE (Windows/MSVC)
  • fixed a performance regression in the generic ARMV8 DGEMM kernel introduced in 0.3.19
  • added initial support for the Apple M1 cpu under Linux
  • added initial support for the Phytium FT2000 cpu
  • added initial support for the Cortex A510, A710, X1 and X2 cpu
  • fixed an accidental mixup of cpu identifiers in the autodetection code introduced in 0.3.20
  • fixed linking of Apple M1 builds on macOS 12 and later with recent XCode
  • made NeoverseN2 available in DYNAMIC_ARCH builds

MIPS,MIPS64:

  • worked around an overflow error in the DNRM2 kernel

LOONGARCH64:

  • worked around an overflow error in the DNRM2 kernel
  • added preliminary support for the LOONGSON2K1000 cpu
  • added DYNAMIC_ARCH support

md5sum
ffb6120e2309a2280471716301824805 OpenBLAS-0.3.21.tar.gz
4f013627138be6ecbd2c8d1435f2ec40 OpenBLAS-0.3.21.zip
c605e9e4ef227605ebcafa6466f14e25 OpenBLAS-0.3.21-x64.zip
16e2cc782e893df47fef97be09896ae1 OpenBLAS-0.3.21-x86.zip

Download OpenBLAS

OpenBLAS 0.3.20 version

20 Feb 21:38
0b678b1
Compare
Choose a tag to compare

general:

  • some code cleanup, with added casts etc.
  • fixed obtaining the cpu count with OpenMP and OMP_PROC_BIND unset
  • fixed pivot index calculation by ?LASWP for negative increments other than one
  • fixed input argument check in LAPACK ? GEQRT2
  • improved the check for a Fortran compiler in CMAKE builds
  • disabled building OpenBLAS' optimized versions of LAPACK complex SPMV,SPR,SYMV,SYR with NO_LAPACK=1
  • fixed building of LAPACK on certain distributed filesystems with parallel gmake
  • fixed building the shared library on MacOS with classic flang

x86_64:

  • fixed cross-compilation with CMAKE for CORE2 target
  • fixed miscompilation of AVX512 code in DYNAMIC_ARCH builds
  • added support for the "incidental" AVX512 hardware in Alder Lake when enabled in BIOS

E2K:

  • add new architecture (Russian Elbrus E2000 family)

SPARC:

  • fix IMIN/IMAX

ARMV8:

  • added SVE-enabled CGEMM and ZGEMM kernels for ARMV8SVE and A64FX
  • added support for Neoverse N2 and V1 cpus

MIPS64:

  • fixed autodetection of MSA capability

LOONGARCH64:

  • added an optimized DGEMM kernel

abfaa43d995046ca4c56ccf14165c93c OpenBLAS-0.3.20.tar.gz
33526b15e15971edb657edc15de0c67f OpenBLAS-0.3.20.zip
3d9daef71592665261c032888bd810d6 OpenBLAS-0.3.20-x64.zip
5bfe847082510e44cdc59755cd49b941 OpenBLAS-0.3.20-x86.zip

Download OpenBLAS

OpenBLAS 0.3.19 version

19 Dec 19:58
2480e50
Compare
Choose a tag to compare

general:

  • reverted unsafe TRSV/ZRSV optimizations introduced in 0.3.16
  • fixed a potential thread race in the thread buffer reallocation routines
    that were introduced in 0.3.18
  • fixed miscounting of thread pool size on Linux with OMP_PROC_BIND=TRUE
  • fixed CBLAS interfaces for CSROT/ZSROT and CROTG/ZROTG
  • made automatic library suffix for CMAKE builds with INTERFACE64 available
    to CBLAS-only builds

x86_64:

  • DYNAMIC_ARCH builds now fall back to the cpu with most similar capabilities
    when an unknown CPUID is encountered, instead of defaulting to Prescott
  • added cpu detection for Intel Alder Lake
  • added cpu detection for Intel Sapphire Rapids
  • added an optimized SBGEMM kernel for Sapphire Rapids
  • fixed DYNAMIC_ARCH builds on OSX with CMAKE
  • worked around DYNAMIC_ARCH builds made on Sandybridge failing on SkylakeX
  • fixed missing thread initialization for static builds on Windows/MSVC
  • fixed an excessive read in ZSYMV

POWER:

  • added support for POWER10 in big-endian mode
  • added support for building with CMAKE
  • added optimized SGEMM and DGEMM kernels for small matrix sizes

ARMV8:

  • added basic support and cputype detection for Fujitsu A64FX
  • added a generic ARMV8SVE target
  • added SVE-enabled SGEMM and DGEMM kernels for ARMV8SVE and A64FX
  • added optimized CGEMM and ZGEMM kernels for Cortex A53 and A55 cpus
  • fixed cpuid detection for Apple M1 and improved performance
  • improved compiler flag setting in CMAKE builds

RISCV64:

  • fixed improper initialization in CSCAL/ZSCAL for strided access patterns

MIPS:

  • added a GENERIC target for MIPS32
  • added support for cross-compiling to MIPS32 on x86_64 using CMAKE

MIPS64:

  • fixed misdetection of MSA capability

9721d04d72a7d601c81eafb54520ba2c OpenBLAS-0.3.19.tar.gz
bd74be5bafbc748266b4e9578bba955b OpenBLAS-0.3.19.zip
507a02d501944bd7586caeee4944d409 OpenBLAS-0.3.19-x86.zip
0cff635aeda36435813caeac391ca39e OpenBLAS-0.3.19-x64.zip

Download OpenBLAS

OpenBLAS 0.3.18 version

02 Oct 17:43
efe4248
Compare
Choose a tag to compare

general:

  • when the build-time number of preconfigured threads is exceeded
    at runtime (by an external program calling BLAS functions from
    a larger number of threads), OpenBLAS will now allocate an
    auxiliary control structure for up to 512 additional threads
    instead of aborting
  • added support for Loongson's LoongArch64 cpu architecture
  • fixed building OpenBLAS with CMAKE and -DBUILD_BFLOAT16=ON
  • added support for building OpenBLAS as a CMAKE subproject
  • added support for building for Windows/ARM64 targets with clang
  • improved support for building with the IBM xlf compiler
  • imported Reference-LAPACK PR 625 (out-of-bounds access in ?LARRV)
  • imported Reference-LAPACK PR 597 for testsuite compatibility with
    LLVM's libomp

x86_64:

  • added SkylakeX S/DGEMM kernels for small problem sizes (MNK<=1000000)
  • added optimized SBGEMM for Intel Cooper Lake
  • reinstated the performance patch for AVX512 SGEMV_T with a proper fix
  • added a workaround for a gcc11 tree-vectorizer bug that caused spurious
    failures in the test programs for complex BLAS3 when compiling at -O3
    (the default for cmake "release" builds)
  • added support for runtime cpu count detection under Haiku OS
  • worked around a long-standing miscompilation issue of the Haswell DGEMV_T
    kernel with gcc that could produce NaN output in some corner cases

POWER:

  • improved performance of DASUM on POWER10

ARMV8:

  • fixed crashes (use of reserved register x18) on Apple M1 under OSX
  • fixed building with gcc releases earlier than 5.1

MIPS:

  • fixed building under BSD

MIPS64:

  • fixed building under BSD

5cd5df5a1541ad414f5874aaae17730f OpenBLAS-0.3.18.tar.gz
0ebf2e1ddc491f37be26bea4e0d1239a OpenBLAS-0.3.18.zip
b76692df00d0b655d4f14058f6c2e10f OpenBLAS-0.3.18-x64.zip
b421f7c47223c5f228c1fe1c66f3f0e1 OpenBLAS-0.3.18-x86.zip
Download OpenBLAS