Skip to content

Releases: simd-everywhere/simde

v0.8.2

02 May 09:58
71fd833
Compare
Choose a tag to compare

SIMDe 0.8.2

Summary

  • Start of RISCV64 optimized implementation using the RVV1.0 vector extension! Thank you @eric900115 @howjmay @zengdage
  • 62 of the ARM Neon intrinsics added in SIMDe 0.8.0 had to be removed for not exactly matching the specs and real hardware
    (from the FCVTZS/FCVTMS/FCVTPS/FCVTNS families). This brings us down from 100% coverage of the NEON functions to 99.07%.

For the entire project: 126 files changed, 5522 insertions(+), 2772 deletions(-)

For just the simde folder: 89 files changed, 4330 insertions(+), 2199 deletions(-)

Details

Implementation of Arm intrinsics

NEON

  • arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 339ffe4 @mr-c
  • arm neon sm3: check constant range 3d34fcd @mr-c
  • arm 32 bits: native def fixes; workarounds for gcc 22900e6 @Cuda-Chen
  • x86 implementations: allow _m128 access from SSE 114c3cd @mr-c

WASM intrinsics

  • wasm x86 impl: some were incorrectly marked SSE instead of SSE2 fee149a @mr-c

x86 intrinsics

SVML

  • SSE is good enough for native m128i and m128d types & functions 9982b27 @mr-c

XOP

Arch support

arm / arm64

RISCV64

Compiler Specific

Clang

Emscripten

  • use __builtin_roundeven{f,} from version 3.1.43 onwards 4379740 @mr-c

MSVC

  • x86 test msvc: really disable warning 4799,4730 487507d @mr-c
  • sse2 MSVC _mm_pause implementaiton for x86 8d95f83 @mr-c
  • SSE is good enough for native m128i and m128d types & functions 9982b27 @mr-c

Testing with Docker/Podman & CI

Cirrus CI

GitHub Actions

Packit CI

Semaphore CI

  • stop testing on GCC 5 & 6, clang 3.9 & 4 due to forced upgrade to Ubuntu 20.04 9982f10 @mr-c

Misc

New Contributors

Full Changelog: v0.8.0...v0.8.2

v0.8.2-rc1

30 Apr 16:39
71fd833
Compare
Choose a tag to compare
v0.8.2-rc1 Pre-release
Pre-release

See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.8.0

Full Changelog: v0.8.0...v0.8.2-rc1

v0.8.0

14 Mar 13:03
589c7d5
Compare
Choose a tag to compare

SIMDe 0.8.0

Summary

  • Complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in the previous release! (@yyctw @wewe5215)
  • SIMDe PRs are tested using Fedora Rawhide (@junaruga)

For the entire project: 656 files changed, 202635 insertions(+), 1724 deletions(-)

For just the simde folder: 295 files changed, 47053 insertions(+), 896 deletions(-)

X86

There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).

Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (ER, PF, 4MAPS, and 4VNNIW) from their intrinsic list. SIMDe will retain those few implementations we already had, but this changes how our completeness statistics are calculated.

Newly added function families

  • AES: 5 of 6 (83.33%)

Newly AVX512 added function families

Additions to existing families

  • AVX512BW: 7 additional, 337 of 790 (42.66%)
  • AVX512DQ: 5 additional, 112 total of 376 (29.79%)
  • AVX512F: 48 additional, 1087 total of 2812 (38.66%)
  • AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)

Neon

SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!

Newly added families

  • abal
  • abal_high
  • abd
  • abdh
  • abdl_high
  • addhn_high
  • aes
  • bfdot
  • bfdot_lane
  • cadd_rot
  • cale
  • calt
  • cmla_lane
  • cmla_rot_lane
  • copy_lane
  • cvt_high
  • cvt_n
  • cvta
  • cvtn
  • cvtp
  • cvtx
  • cvtx_high
  • div
  • dupb_lane
  • duph_lane
  • eor3
  • fmlal
  • fms
  • fms_lane
  • fms_n
  • ld2_dup
  • ld2_lane
  • ld3_dup
  • ld3_lane
  • ld4_dup
  • maxnmv
  • minnmv
  • mla_lane
  • mla_high_lane
  • mls_lane
  • mlsl_high_lane
  • mmla
  • mull_high_lane
  • mull_high_n
  • mulx
  • mulx_lane
  • pmaxnm
  • pminnm
  • qdmlal
  • qdmlal_high
  • qdmlal_high_lane
  • qdmlal_high_n
  • qdmlal_lane
  • qdmlal_n
  • qdmlsl
  • qdmlsl_high
  • qdmlsl_high_lane
  • qdmlsl_high_n
  • qdmlsl_lane
  • qdmlsl_n
  • qdmlslh
  • qdmlslh_lane
  • qdmulhh
  • qdmulhh_lane
  • qdmull_high
  • qdmull_high_lane
  • qdmull_high_n
  • qdmull_lane
  • qdmull_n
  • qdmullh_lane
  • qmovun_high
  • qrdmlah
  • qrdmlah_lane
  • qrdmlahh
  • qrdmlahh_lane
  • qrdmlsh
  • qrdmlsh_lane
  • qrdmlshh
  • qrdmlshh_lane
  • qrdmulhh_lane
  • qrshl
  • qrshlh
  • qrshrn_high_n
  • qrshrnh_n
  • qrshrun_high_n
  • qrshrunh_n
  • qshl_n
  • qshlh_n
  • qshluh_n
  • qshrn_high_n
  • qshrnh_n
  • qshrun_high_n
  • qshrunh_n
  • raddhn
  • raddhn_high
  • rax
  • recp
  • rnd32x
  • rnd32x
  • rnd32x
  • rnd64z
  • rnda
  • rndx
  • rshrn_high_n
  • rsubhn
  • rsubhn
  • set_lane
  • sha1
  • sha1h
  • sha256
  • sha512
  • shll_high_n
  • shrn_high_n
  • sli_n
  • sm3
  • sm4
  • sqrt
  • st1_x2
  • st1_x3
  • st1_x4
  • st1q_x2
  • st1q_x3
  • st1q_x4
  • subhn_high
  • sudot_lane
  • usdot
  • usdot_lane

Finally complete families

  • cvtn
  • mla_lane

Details

  • simde-f16: improve _Float16 usage; better INFHF/NANHF defs 8910057 @mr-c
  • simde_float16: prefer __fp16 if available aba26f6 @mr-c

Implementation of Arm intrinsics

NEON

SVE Intrinsics

WASM intrinsics

  • simd128: fix altivec_p7 version of wasm_f64x2_pmin 96d6e53 @mr-c
  • simd128: add missing unsigned functions ea5e283 @mr-c
  • simd128 f{32x4,64x2}_min: add workaround for a gcc<6 issue d5d6d10 @mr-c
  • detect support for Relaxed SIMD mode 2e66dd4 @mr-c
  • simd128/relaxed: begin MIPS implementations db8ad84 @mr-c
  • relaxed: add f{32x4,64x2}_relaxed_{min,max} 9d1a34e @mr-c
  • relaxed: updated names; reordered FMA operations 8cc8874 @mr-c

x86 intrinsics

  • sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 6ce6030 @mr-c

SSE*

  • sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT
  • sse: implement _mm_movelh_ps for Arm64 514564e @mr-c
  • sse _mm_movemask_ps: remove unused code fba97e4 @mr-c
  • sse2 mm_pause: more archs, add a basic test 692a2e8 @mr-
  • sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 edd4678 @mr-c
  • sse4.1 _mm_testz_si128: fix backwards short circuit logic f132275 @mr-c

AVX

AVX2

AVX512

CLMUL

SVML

AES

MIPS MSA intrinics

  • msa neon impl: float64x2_t is not avail in A32V7 ae4c4ab @mr-c

Arch support

x86(-64)

arm64

  • x86 aes: add neon implementation using the crypto extension fb3554f @mr-

Altivec

  • neon/st1: disable last remaining AltiVec implementation 0521245 @mr-c

Power

  • sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC 4de999a @mr-c
  • wasm simd128: more powerpc fixes 7cb5691 @mr-c

Compiler Specific

GCC

Clang

ClangCL

  • fp16: don't use _Float16 on ClangCL if not supported 8a6b8c5 @mr-c
  • svml: don't...
Read more

v0.8.0-rc2

07 Mar 14:15
Compare
Choose a tag to compare
v0.8.0-rc2 Pre-release
Pre-release

See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.7.6

What's Changed since RC1

New Contributors

Full Changelog: v0.8.0-rc1...v0.8.0-rc2

v0.8.0-rc1

20 Nov 17:41
e651ec3
Compare
Choose a tag to compare
v0.8.0-rc1 Pre-release
Pre-release

See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes

New Contributors

Full Changelog: v0.7.6...v0.8.0-rc1

v0.7.6

16 May 16:51
fefc785
Compare
Choose a tag to compare

Summary

See, I knew we should release more often!

Details

Implementation of Arm intrinsics

NEON

neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations 3a18dff @mr-c
neon/cvtn: basic implementation of a few functions fefc785 @mr-c
neon/mla_lane: initial implementation using mla+dup 554ab18 @ngzhian
neon/shl,rshl: fix avx include to unbreak amalgamated hearders 3748a9f @mr-c
neon/shll_n: make vshll_n_u32 test operational 356db0c @mr-c
neon/qabs: restore SSE2 impl for vqabsq_s8 f614843 @mr-c

x86 intrinsics

mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 51bf6f2 @mr-c
x86/sse*,avx: add additional SIMD128 implementations e28a87e @mr-c

SSE*

sse{,2,3,4.1},avx: more WASM shuffle implementations 097dd12 @mr-c
sse*,avx: add additional SIMD128 implementations e28a87e @mr-c
sse: allow native _mm_loadh_pi on MSVC x64 314452b @mr-c

AVX512

avx512: typo fix for typedef of __mmask64 e8390a3 4a9f01a @mr-c
avx512/madd: fix native alias arguments for _mm512_madd_epi16 bcf4adb @mr-c

Arch support

simde-arch: #include Hedley for setting F16C for MSVC 2022+ with AVX2 f9cf467 @mr-c

Testing with Docker/Podman & CI

tests: simde_assert_equal_{v,}f funcs were silently failing 395efd9 @mr-c
tests: Quiet another Clang < v5 warning that resurfaced d9d2b45 @mr-c
tests: audit use of HEDLEY_DIAGNOSTIC_PUSH and _POP 284c88a @mr-c
test: ignore -Wc99-extensions e264ff5 @mr-c
neon/aba: vaba_s32 test was not being run f86346a @mr-c
sve/and: the svand_n_s8_m test is incomplete, mark it as such b962f07 @mr-c
tests: combine declarations in test functions 76c7d37 @mr-c

Local testing with Docker/Podman

docker: add wasm64 target 29db539 @mr-c

Drone.io

remove Drone.io fd10911 @mr-c

GitHub Actions

gh-actions: confirm that all header files are installed 8d5e05a @mr-c
gh-actions: put wasm64 under CI 6702820 @mr-c

Netlify

netlify: disable for now caa0929 @mr-c

Misc

meson install: arm/neon/ld1 & x86/avx512.h 27836b1 @mr-c
Update clang version detection for 14..16 and add link 4957a9e @jan-wassenberg

v0.7.4

05 May 05:35
0c26988
Compare
Choose a tag to compare

SIMDe 0.7.4

Summary

  • Minimum meson version is now 0.54
  • 40 new NEON families implemented, SVE API implementation started (14 families)
  • Initial support for x86 F16C API
  • Initial support for MIPS MSA API
  • Initial support for Arm Scalable Vector Extensions (SVE) API
  • Initial support for WASM SIMD128 API
  • Initial support for the E2K (Elbrus) architecture
  • MSVC has many fixes, now compiled in CI using /ARCH:AVX, /ARCH:AVX2, and /ARCH:AVX512

X86

There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far.
Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)

Newly added function families

Additions to existing families

  • AVX512F: 579 additional, 856 total of 2660 (31.80%)
  • AVX512BW: 178 additional, 335 total of 828 (40.46%)
  • AVX512DQ: 77 additional, 111 total of 399 (27.82%)
  • AVX512_VBMI: 9 additional, 30 total of 30 💯
  • KNCNI: 113 additional, 114 total of 595 (19.16%)
  • VPCLMULQDQ: 1 additional, 2 total of 2 💯

Neon

SIMDe currently implements 3745 out of 6670 (56.15%) NEON functions. If you don't count 16-bit floats and poly types, it's 3745 / 4969 (75.37%).

Newly added families

  • addhn
  • bcax
  • cage
  • cmla
  • cmla_rot90
  • cmla_rot180
  • cmla_rot270
  • fma
  • fma_lane
  • fma_n
  • ld2
  • ld4_lane
  • mlal_high_n
  • mlal_lane
  • mls_n
  • mlsl_high_n
  • mlsl_lane
  • mull_lane
  • qdmulh_lane
  • qdmulh_n
  • qrdmulh_lane
  • qrshrn_n
  • qrshrun_n
  • qshlu_n
  • qshrn_n
  • qshrun_n
  • recpe
  • recps
  • rshrn_n
  • rsqrte
  • rsqrts
  • shll_n
  • shrn_n
  • sqadd
  • sri_n
  • st2
  • st2_lane
  • st3_lane
  • st4_lane
  • subhn
  • subl_high
  • xar

MSA

Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.

Details

Implementation of Arm intrinsics

NEON

Read more

v0.7.4-rc3

29 Apr 07:33
f19193b
Compare
Choose a tag to compare
v0.7.4-rc3 Pre-release
Pre-release

Full Changelog: v0.7.4-rc2...v0.7.4-rc3

v0.7.4-rc2

19 Apr 08:10
7e70d02
Compare
Choose a tag to compare
v0.7.4-rc2 Pre-release
Pre-release

Full Changelog: v0.7.4-rc1...v0.7.4-rc2

SIMDe 0.7.4-RC1

02 Feb 18:07
9609eb2
Compare
Choose a tag to compare
SIMDe 0.7.4-RC1 Pre-release
Pre-release
v0.7.4-rc1

prepare to release 0.7.4