Release OpenBLAS 0.3.21 version · OpenMathLib/OpenBLAS

general:

updated the included LAPACK to Reference-LAPACK 3.10.1
when no Fortran compiler is available, OpenBLAS builds will now automatically
build LAPACK from an f2c-converted copy of LAPACK 3.9.0 unless the NO_LAPACK option
is specified (more recent releases make too heavy use of Fortran90+ features to be easily convertible to C)
similarly added C versions of the BLAS and CBLAS tests
enabled building of the ReLAPACK GEMMT kernels when ReLAPACK is built
function LAPACKE_lsame is now annotated with the GCC attribute "const" to aid static analyzers
added USE_TLS to the list of options reported by the openblas_get_config() function
added openblas_getaffinity() as a Linux-only convenience function wrapping pthread_getaffinity_np()
CMAKE builds now support the BUILD_TESTING keyword (to disable the LAPACK testsuite) of Reference-LAPACK
fixed CMAKE builds of the laswp_ncopy and neg_tcopy kernels
removed the build system requirements for PERL (while keeping the original perl scripts as backup)
handle building and running OpenBLAS on systems that report zero available cpu cores
added SYMBOLPREFIX/SYMBOLSUFFIX handling for LAPACK 3.10.0 functions added in 0.3.20
fixed linking of the utests on QNX
Added support for compilation with the Intel ifx compiler
Added support for compilation with the Fujitsu FCC compiler for Fugaku
Added support for compilation with the Cray C and Fortran compilers
reverted OpenMP threadpool behaviour in the exec_blas call to its state before 0.3.11, that is
the threadpool will no longer grow or shrink on demand as the overhead for this is too big at least with
GNU OpenMP. The adaptive behaviour introduced in 0.3.11 can still be requested at runtime by setting
the environment variable OMP_ADAPTIVE
worked around spurious STFSM/CTFSM errors reported by the LAPACK testsuite

x86_64:

fixed determination of compiler support for AVX512 and removed the 0.3.19
workaround for building SKYLAKEX kernels on Sandybridge hardware
fixed compilation for the SKYLAKEX target with gcc 6
fixed compilation of the CooperLake SBGEMM kernel with LLVM
fixed compilation of the SkyLakeX small matrix GEMM kernels with LLVM or ICC
fixed compilation of some BFLOAT16 kernels with CMAKE
added support for the Zhaoxin/Centaur KH40000 cpu
fixed a potential crash in the ZSYMV kernel used for all targets except generic
fixed gmake compilation for DYNAMIC_ARCH with a DYNAMIC_LIST including ATOM
fixed compilation of LAPACKE with the INTEGER64 option on Windows
added support for cross-compiling to individual Intel or AMD targets using CMAKE
(previously only CORE2 supported, added targets are ATOM, PRESCOTT, NEHALEM, SANDYBRIDGE,
HASWELL,SKYLAKEX, COOPERLAKE, SAPPHIRERAPIDS, OPTERON, BARCELONA, BULLDOZER, PILEDRIVER,
STEAMROLLER,EXCAVATOR, ZEN)

SPARC:

worked around an overflow error in the DNRM2 kernel

POWER:

worked around an overflow error in the POWER6 DNRM2 kernel
fixed compilation on PPC440
fixed a performance regression in the level1 BLAS on POWER10
fixed the POWER10 ZGEMM kernel
fixed singlethreaded builds for POWER10
fixed compilation of the POWER10 DGEMV kernel with older gcc versions
enabled compilation of the BFLOAT16 kernels by default
enabled the small matrix kernels by default for DYNAMIC_ARCH builds
added a workaround for a miscompilation of the CDOT and ZDOT kernels by GCC 12

RISCV:

fixed cpu autodetection logic

ARMV8:

added an SBGEMM kernel for Neoverse N2
worked around an overflow error in the DNRM2 kernel used on M1, NeoverseN1, ThunderX2T99
added support for ARM64 systems running MS Windows
added support for cross-compiling to the GENERIC ARMV8 target under CMAKE (Windows/MSVC)
fixed a performance regression in the generic ARMV8 DGEMM kernel introduced in 0.3.19
added initial support for the Apple M1 cpu under Linux
added initial support for the Phytium FT2000 cpu
added initial support for the Cortex A510, A710, X1 and X2 cpu
fixed an accidental mixup of cpu identifiers in the autodetection code introduced in 0.3.20
fixed linking of Apple M1 builds on macOS 12 and later with recent XCode
made NeoverseN2 available in DYNAMIC_ARCH builds

MIPS,MIPS64:

worked around an overflow error in the DNRM2 kernel

LOONGARCH64:

worked around an overflow error in the DNRM2 kernel
added preliminary support for the LOONGSON2K1000 cpu
added DYNAMIC_ARCH support

md5sum
ffb6120e2309a2280471716301824805 OpenBLAS-0.3.21.tar.gz
4f013627138be6ecbd2c8d1435f2ec40 OpenBLAS-0.3.21.zip
c605e9e4ef227605ebcafa6466f14e25 OpenBLAS-0.3.21-x64.zip
16e2cc782e893df47fef97be09896ae1 OpenBLAS-0.3.21-x86.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenBLAS 0.3.21 version