Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMDe 0.7.4 #347

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

SIMDe 0.7.4 #347

wants to merge 5 commits into from

Conversation

mr-c
Copy link
Contributor

@mr-c mr-c commented May 5, 2023

The magic commands are

git remote add simde git@github.com:simd-everywhere/simde-no-tests.git
git fetch --all
git subtree pull --squash --prefix lib/simde/simde/ simde v0.7.4

mr-c added 3 commits May 5, 2023 08:08
02c7a67e sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH
b0b370a4 x86/sse: Add LoongArch LSX support
2338f175 arch: Add LoongArch LASX/LSX support
90d95fae avx512: define __mask64 & __mask32 if not yet defined
42a43fa5 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017
20f98da6 sve/whilelt: correct type-o in __mmask32 initialization
47a1500f sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017
cd93fcc9 avx512/knot,kxor: native calls not availabe on MSVC 2017
ba6324b6 avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019
2f6fe9c6 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512
91fda2cc axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions
a397b74b __builtin_signbit: add cast to double for old Clang versions
e016050b clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F
7e353c00 Wasm q15mulr_sat_s: match Wasm spec
ce375861 Wasm f32/f64 nearest: match Wasm spec
96d5e034 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec
5676a1ba Wasm f32/f64 abs: match Wasm spec
aa299c08 Wasm f32/f64 max: match Wasm spec
433d2b95 Wasm f32/f64 min: match Wasm spec
cf1ac40b avx{,2}: some intrinsics are missing from older MSVC versions
bff9b1b3 simd128: move unary minus to appease msvc native arm64
efc512a4 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions
091250e8 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC
4b305360 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE
2dedbd9b skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX
a04ea7bc f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph
e8ee041a ci appveyor: build tests with AVX{,2}, but don't run them
2188c972 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8}
186f12f1 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask}
6a40fdeb arm/neon/rnd: use correct SVML function for simde_vrndq_f64
9a0705b0 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE
c298a7ec msvc avx512/roundscale_round: quiet a false positive warning
01d9c5de sse: remove errant MMX requirement from simde_mm_movemask_ps
c675aa08 x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc
097af509 msvc 2022: enable F16C if AVX2 present
91cd7b64 avx{,2}: fix maskload illegal mem access
2caa25b8 Fixed simde_mm_prefetch warnings
96bdf523 Fixed parameters to _mm_clflush
4d560e41 emscripten; don't use __builtin_roundeven{f,} even if defined
511a01e7 avx512/compress: Mitigate poor compressstore performance on AMD Zen 4
a22b63dc avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops
3d87469f wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type
56ca5bd8 Suppress min/max macro definitions from windows.h
f2cea4d3 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving
3698cef9 neon/cvt: clang bug 46844 was fixed in clang 12.0
9369cea4 simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2}
ce27bd09 gcc power: vec_cpsgn argument reversal fixed in 12.0
20fd5b94 gcc power: bugs 1007[012] fixed in GCC 12.1
5e25de13 gcc sse2: bug 99754 was fixed in GCC 12.1
e6979602 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error
359c3ff4 clang wasm simde: add workaround to fix wasm_i64x2_shl bug
b767f5ed arm/neon: workaround on ARM64 windows bug
599b1fbf mips/msa: fix for Windows ARM64
c6f4821e arm64 windows: fix simd128.h build error
782e7c73 prepare to release 0.7.4
6e9ac245 fix A32V7 version of _mm_test{nz,}c_si128
776f7a69 test with Debian default flags, also for armel
a240d951 x86: fix AVX native → SSE4.2 native
5a73c2ce _mm_insert_ps: incorrect handling of the control
597a1c9e neon/ld1[q]_*_x2: initial implementation
4550faea wasm: f32x4 and f64x2 nearest roundeven
5e068645 Add missing `static const` in simde-math.h. NFC
da02f2ce avx512/setzero: fix native aliases
89762e11 Fixed FMA detection macro on msvc
b0fda5cf avx512/load_pd: initial implementation
a61af077 avx512/load_ps: initial implementation
4126bde0 Properly map __mm functions to __simde_mm
2e76b7a6 neon ld2: gcc-12 fixes
604a53de fix wrong size
e5e085ff AVX: add native calls for _mm256_insertf128_{pd,ps,si256}
ee3bd005 aarch64 + clang-1[345] fix for "implicit conversion changes signedness"
a060c461 wasm: load lane memcpy instead of cast to address UBSAN issues
cbef1c15 avx512/permutex2var: hard-code types in casts instead of using typeof
71a65cbd gfni: add cast to work around -Wimplicit-int-conversion warning
10dd508b avx512/scalef: work around for GCC bug #101614
277b303b neon/cvt: fix compilation with -ffast-math
9ec8c259 avx512/scalef: _mm_mask_scalef_round_ss is still missing in GCC
e821bee3 Wrap static assertions in code to disable -Wreserved-identifier
13cf2969 The fix for GCC bug #95483 wasn't in a release until 11.2
b66e3cb9 avx2: separate natural vector length for float, int, and double types
dda31b76 Add -Wdeclaration-after-statement to the list of ignored warnings.
9af03cd0 Work around compound literal warning with clang
74a4aa59 neon/clt: Add SSE/AVX512 fallbacks
02ce512d neon/mlsl_high_n: initial implementation
6472321c neon/mlal_high_n: initial implementation
2632bbc1 neon/subl_high: initial implementation
d1d2362d neon/types: remove duplicate NEON float16_t definitions
456812f8 sse: avoid including windows.h when possible
332dcc83 neon/reinterpret: change defines to work with templated callers
e369cd0c neon/cge: Improve some of the SSE2 fallbacks
3397efe1 deal with WASM SIMD128 API changes.
3aa4ae58 neon/rndn: Fix macros to workaround bugs
30b3607b neon/ld1: Fix macros in order to workaround bugs
8cac29c6 neon/cge: Implement f16 functions
c96b3ae6 neon/cagt: Implement f16 functions
f948d39a neon/bsl: Implement f16 functions
d6e025bd neon/reinterpret: f16_u16 and u16_f16 implementations
5e763da5 neon/add: Implement f16 functions
5a7c9e13 neon/ceqz: Implement f16 functions
1ba94bc4 neon/dup_n: Implement f16 functions
af26004a neon/ceq: Implement f16 functions
e41944f3 neon/st1: Add f16 functions
a660d577 neon/cvt: Implement f16 functions
412da5b3 neon/ld1: Implement f16 functions
068485c9 neon/cage: Initial f16 implementations
89fb99ee neon: Implement f16 types
50a56ef7 sse4.2: work around more warnings on old clang
fa54e7b3 avx512/permutex2var: work around incorrect definition on old clang
d20c7bf8 sse: use portable implementation to work around llvm bug #344589
371fd445 avx: work around incorrect maskload/store definitions on clang < 3.8
3bb373c8 Various fixes for -fno-lax-vector-conversions
f26ad2d1 avx512/fixupimm: initial implementation
f9182e3b Fix warnings with -fno-lax-vector-conversions
37c26d7f avx512/dpbusds: complete function family
0dc7eaf6 sse: replace _mm_prefetch implementation
b7fd63d9 neon/ld1q: u8_x2, u8_x3, u8_x4
6427473b neon/mul: add improved SSE2 vmulq_s8 implementation
b843d7e1 avx512/cvt: add _mm512_cvtepu32_ps
5df05510 simd128: improve many lt and gt implementation
495a0d2a neon/mul: implement unsigned multiplication using signed functions
2b087a1c neon/qadd: fix warning in ternarylogic call in vaddq_u32
f027c8da neon/qabs: add some faster implementations
bf6667b4 simd128: add fast sqrt implementations
d490ca7a simd128: add fast extmul_low/high implementations
2abd2cc0 simd128: add NEON and POWER shift implementations
3032eb33 simd128: add fast promote/demote implementations
e92273a6 simd128: add dedicated functions for unsigned extract_lane
34c5733c sse2, sse4.1: pull in improved packs/packus implementations from WASM
1bfc221c simd128: add fast narrow implementations
f333a089 simd128: add fast implementations of extend_low/extend_high
b4e0d0cc msa/madd: initial implementation
c09e6b0a neon/rndn: work around some missing functions in GCC on armv8
cc7afa77 avx512/4dpwssds: initial implementation
a9cec6fe avx512/dpbf16: implement remaining functions
371da5f8 avx512/dpwssds: initial implementation
ccef3bee common: Use AArch64 intrinsics if _M_ARM64EC is defined
f79c08c3 xop: fix NEON implementation of maccs functions to use NEON types
9eb0a88d sse4.1: use NEON types instead of vector in insert implementations
0bbae5ff avx512/roundscale: don't assume IEEE 754 storage
77673258 fma: use NEON types in simde_mm_fnmadd_ps NEON implementation
865412e7 sse2: remove statement expr requirement for NEON srli/srai macros
573c0a24 sse4.1: replace NEON implementations with shuffle-based implementations
534794b2 sse4.1: remove statement expr dependency in blend functions
a571ca8c fma: fix return value of simde_mm_fnmadd_ps on NEON
df95ab8e sse, sse2: clean up several shuffle macros
44e25b30 sse2: add parenthesis around macro arguments
305ac0a8 avx512/set, avx512/popcnt: use _mm512_set_epi8 only when available
98de6621 relaxed-simd: add blend functions
974f83d5 relaxed-simd: add fms functions
a46a04b7 relaxed-simd: add fma functions
54c62bf7 avx512/popcnt: implement remaining functions
d4dc926f avx512/dpbf16: initial implementation
b9a7904d avx512/4dpwssd: implement complete function family
f54cc98a avx512/dpwssd: initial implementation
7e877d17 avx512/bitshuffle: initial implementation
9e96b711 avx512/dpbusd: implement remaining functions
423572d5 simd128: use vec_cmpgt instead of vec_cmplt in pmin
73b6978f sse, sse2: fix vec_cpsign order test
7c0bdbff gfni: remove unintentional dependency on vector extensions
26fcfdb1 simd128: add fast ceil implementations
85035430 Improve widening pairwise addition implementations
8f35dc1a simd128: add fast max/pmax implementations
a8adeffc neon/cvt: disable some code on 32-bit x86 which uses _mm_cvttsd_si64
29955848 avx512/shldv: limit shuffle-based version to little endian
ae330dd9 simd128: add NEON, Altivec, & vector extension sub_sat implementations
9debe735 neon/cvt, relaxed-simd: add work-around for GCC bug #101614
eab383d9 avx512/dbsad: add vector extension impl. and improve scalar version
79c93ce0 sse, sse2: sync clang-12 changes for vec_cpsgn
7205c644 avx512/cvtt: _mm_cvttpd_epi64 is only available on x86_64
42538f0e simd128, sse2: more cvtpd_ps/f32x4_demote_f64x2_zero implementations
1bec285e simd128, sse2: add more madd_epi16 / i32x4_dot_i16x8 implementations
6dfdf3d2 simd128: vector extension implementation of floating-point abs
00c3b68b simd128, neon/neg: add VSX implementations of abs and neg functions
7f3a52d0 neon/cgt, simd128: improve some unsigned comparisons on x86
f5184634 neon/abd: add much better implementations
9b1974dd Add @aqrit's SSE2 min/max implementations
9caf5e6e simd128: add more pmin/pmax implementations
dcd00397 neon/qrdmulh: steal WASM q15mulr_sat implementation for qrdmulhq_s16
34dee780 simd128: add SSE2 q15mulr_sat implementation
fe3e623e neon/min: add SSE2 vminq_u32 implementation
4abbb4db neon/min: add SSE2 vqsubq_u32 implementation
c1158835 simd128: add improved min implementations on several architectures
c059f800 relaxed-simd: add trunc functions
0394e967 simd128: add several some AArch64 and Altivec trunc_sat implementations
3fa2026b Fix several places where we assumed NEON used vector extensions.
6a183313 neon/qsub: add some SSE and vector extension implementations
313561fe msa/subv: initial implementation
8f1155e4 msa/andi: initial implementation
d20bca47 msa/and: initial implementation
82e93303 gfni: work around clang bug #50932
3a27037f arch: set SIMDE_ARCH_ARM for AArch64 on MSVC
d19a9d6a msa/adds: initial implementation
41f9ad33 neon/qadd: improve SSE implementation
eb55cce3 avx512/shldv: initial implementation
ee0a83e1 avx512/popcnt: initial implementation
48855d3a msa/adds_a: initial implementation
6133600b neon/qadd: add several improved x86 and vector extension versions
6b5814d9 avx512/ternarylogic: implement remaining functions
3fba9986 Add many fast floating point to integer conversion functions
b2f01b98 neon/st4_lane: Implement remaining functions
ccc9e2c8 neon/st3_lane: Implement remaining functions
3f0859be neon/st2_lane: Implement remaining functions
e136dfe7 neon/ld1_dup: Add f64 function implementations
4a2ceb45 neon/cvt: add some faster x86 float->int/uint conversions
b82b16ac neon/cvt: Add vcvt_f32_f64 and vcvt_f64_f32 implementations
477068c9 neon/st2: Implement remaining functions
3a93c5dd neon/ld4_lane: Implement remaining functions
75838c15 neon/qshlu_n: Add scalar function implementations
7d314092 simde/scalef: add scalef_ss/sd
d3547dac msa/add_a: initial implementation
8ba8dc84 msa/addvi: initial implementation
b1006161 Begin working on implementing MIPS MSA.
38088d10 fma: use fma/fms instead of mla/mls on NEON
76c4b7cd neon/cle: add some x86 implementations
d045a667 neon/cle: improve formatting of some x86 implementations
6fc12601 relaxed-simd: initial support for the WASM relaxed SIMD proposal
2d430eb4 neon/ld2: Implement remaining functions
fc3aef94 neon/ld1_lane: Implement remaining functions
0ec9c9c9 neon/rsqrte: Implement remaining functions
92e72c44 neon/rsqrts: Add remaining function implementations
e7cdccd0 neon/qdmulh_lane: Add remaining function implementations
905f1e4c neon/recpe: Add remaining function implementations
96cebc42 neon/recps: Add scalar function implementations
63ad6d0a neon/qrdmulh_lane: Add scalar function implementations
f8dacd07 simde-diagnostic: Include simde-arch
4ad3f10f neon/mul_lane: Add mul_laneq functions
25d0fe82 neon/sri_n: Add scalar function implementations
6fb9fa3a neon/shl_n: Add scalar function implementations
5738564f neon/shl: Add scalar implementations
fc2aed9b neon/rsra_n: Add scalar function implementations
7c7d8d80 neon/qshrn_n: Add scalar function implementations
76e65444 neon/qrshrn_n: Add scalar function implementations
25aa2124 neon/rshr_n: Add custom scalar function for utility
6d1c7aaf avx512/dbsad: initial implementation
4b1ba2ce avx512/dpbusd: initial implementation
02719bcc svml: remove some dead stores from cdfnorminv
803b29ac sse2: fix set but not used variable in _mm_cvtps_epi32
7ee622df Use SIMDE_HUGE_FUNCTION_ATTRIBUTES on several functions.
80439178 arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check
604a90af neon/cvt: fix a couple of s390x implementations' NaN handling
a0fe7651 simd128: work around bad diagnostic from clang < 7
cd742d66 f16c: use __ARM_FEATURE_FP16_VECTOR_ARITHMETIC to detect Arm support
4f39e4fc Fix an assortment of small bugs
4bf12875 Remove all `&& 0`s in preprocessor macros.
8e0d0f93 simd128: remove stray `&& 0`
d98f81cb simd128: add optimized f32x4.floor implementations
b626266d simd128: add some Arm implementations of all_true
78957358 simd128: any_true implementations for Arm
20cd4d00 simd128: add improved add_sat implementations
ea364550 wasm128, sse2: disable -Wvector-conversion when calling vgetq_lane_s64
4e09afb4 neon/zip1: add armv7 implementations
f27932a7 simd128: add x86/Arm/POWER implementations
2bcd59bb avx512/conflict: implement missing functions
7da82adb avx512/multishift: initial implementation
e7229088 various: correct PPC and z/Arch versions plus typo
005d39c8 simd128: fix portable fallback for wasm_i8x16_swizzle
860127a1 Add NEON, SSE3, and AltiVec implementations of wasm_i8x16_swizzle
0959466e simd128: add AltiVec implementations of any/all_true
7f38c52e simd128: add vec_abs implementation of wasm_i8x16_abs
e2cb9632 simd128: work around clang bugs 50893 and 50901.
77e4f57d avx512/rol: implement remaining functions
1d60dc03 avx512/rolv: initial implementation
30681718 avx512: initial implementation
38f8ef8f avx512/ternarylogic: initial implementation
3efe186a Add constrained compilation mode
1faf7872 simd128: add simde_wasm_i64x2_ne
68616767 avx512/scalef: implement remaining functions
6ea919f8 avx512/conflict: implements mm_conflict_epi32
ad5d51c5 avx512/scalef: initial implementation
4f0f1e8f neon/qrshrun_n: Add scalar function implementations
dc278de7 neon/rshr_n: Add scalar function implementations
86f73e1e neon/rndn: Add macro corrections
189d7762 neon/qshrun_n: Add scalar function implementations
1fc63065 neon/rshl: Add scalar function implementations
4ca2973e neon/rndn: Add scalar function implementation
d78398c8 neon/qdmulh: Add scalar function implementations
7d43b7c9 neon/pmin: Add scalar function implementations
4dacfeff neon/pmax: Add scalar function implementations
abccc767 neon/padd: Add scalar function implementations
b3d97677 neon/neg: Complete implementation of function family
137afad7 neon/dup_lane: Complete implementation of function family
ef93f1bb neon/fma_lane: Implement fmaq_lane functions
e9dcfe8b neon/sra_n: Add scalar function implementations
44cf247c neon/shr_n: Add scalar function implementations
ca78eb82 neon/sub: Implements the two remaining scalar functions
65d8d52f avx512/rorv: implement _mm{256,512}{,_mask,_maskz}_rorv_epi{32,64}
1afa8148 Many work-arounds for GCC with MSA, and support in the docker image.
8bf571ac neon/ext: clean up shuffle-based implementation
51790ff8 avx512/rorv: initial implementation of _mm_rorv_epi32
952dab89 neon/st3: Add shuffle vector implementations
2229f4ba sse, sse2: work around GCC bug #100927
e0b88179 neon/ld{2,3,4}: disable -Wmaybe-uninitialized on all recent GCC
76c76bfa neon/fma_lane: portable and native implementations
002b4066 neon/mul_lane: finish implementation of function family
ae959e7e neon/;shlu_n: faster WASM implementations
7df8e3ab neon/qshlu_n: initial implementation
338eb083 neon/ld4: use conformant array parameters
049eaa9e neon/vld4: Wasm optimization of vld4q_u8
720db9ff neon/st3q_u8: Wasm optimization
ccf235e1 neon/qdmull: add WASM implementations
06a64a94 neon/movl: improve WASM implementation
e36a029e neon/tbl: add WASM implementation of vtbl1_u8
5debb615 neon/tst: implement scalar functions
cef74f3b neon/hadd,hsub: optimization for Wasm
502243a2 neon/qrdmulh_lane: fix typo in undefs
6eb625d7 fma: drop weird high-priority implementation in _mm_fmadd_ps
47ba41d6 neon/qshrn_n: initial implementation
b94e0298 neon/qrdmulh: native aliases for scalar functions should be A64
f27e9fcb neon/qrdmulh_lane: initial implementation
04e2ca66 neon/subhn: initial implementation
8b129a93 neon/sri_n: add 128-bit implementations
88dd65de neon/mull_lane: initial implementation
12c940ed neon/mlsl_lane: initial implementation
abc8dacf neon/mlal_lane: initial implementation
9438ea43 neon/dup_lane: fix macro for simde_vdup_laneq_u16
36e2ce5b neon/{add,sub}w_high: use vmovl_high instead of vmovl + get_high
d86492fa neon/sri_n: native and portable
60715735 neon/qshrun_n: native and portable implementations
de84bcd0 neon/qdmulh_lane: native and portable
4581232f avx512/roundscale_round: implement remaining functions
76b19b97 avx512/range_rounnd,round: move range_round functions out of round
2ba2b7b8 neon/ld1_dup: native and portable (64-bit vectors)
f6fd4b67 neon/dup_lane: implement vdupq_lane_f64
07b4a2b3 neon/shll_n: native and portable implementations
58a0188d neon/dupq_lane: native and portable
623f2207 neon/st4_lane: portable and native *_{s,u}{8,16,32}
322663be neon/st3_lane: portable and native *_{s,u}{8,16,32}
7700b2e5 neon/st2_lane: portable and native for _{u,s}{8,16,32}
acc67df2 neon/cltz: Add scalar functions and natural vector fallbacks
fcf6e88e neon/clt: Add implementations of scalar functions
799e1629 neon/clez: Add implementaions of scalar functions
f22ae740 neon/addhn: initial implementation
8774393f avx512/cmp{g,l}e: AVX-512 implementations of non-mask functions
1eb57468 avx512/cmple: finish implementations of all cmple functions
9b60d826 avx512/cmpge: fix bad _mm512_cmpge_epi64_mask implementation
6849da33 avx: use internal symbols in clang fallbacks for cmp_ps/pd functions
f2746208 avx512/cmpge: finish implementing all functions
135cbbf0 avx512/range: implement mm{,512}{,_mask,_maskz}_range_round*
6421a835 avx512/round, avx512/roundscale: add shorter vector fallbacks
5c6673f5 avx512/roundscale: implement simde_mm{256,512}_roundscale_ps
6fcb4433 neon/cle: Add implementations for remaining functions
a49bdc1c neon/fma_n: the 32-bit functions are missing on GCC on arm
05172a08 neon/ld4: work around spurious warning on clang < 10
2fa3d1d8 neon/qdmulh: add shuffle-based implementations
ea22a611 neon/qdmulh_n: native and portable implementations
5ef8e53d neon/qrshrn_n: native and portable implementations
fda538d1 neon/ld1_lane: portable and native implementations
8f118bbd neon/cgtz: Add implementations of remaining functions
31d5048c neon/cgt: Add implementation of remaining functions
79274d8d neon/ld4_lane: move private type usage to inside loop
bdcfccb7 neon/ld4_lane: native and portable implementations
bbc35b65 avx512/range: don't used masked comparisons for 128/256-bit versions
ef90404e avx512/range: fix fallback macros
5d00aa4c features: add z/arch to SIMDE_NATURAL_VECTOR_SIZE
83cab7c1 sve/cmplt: replace vec_and with & for s390 implementations
a636d0ae Fix gcc-10 compilation on s/390x
bb35d9f0 gfni: work around error with vec_bperm on clang-10 on POWER
2db3ba03 gfni: replace vec_and and vec_xor with & and ^ on z/arch
cdb3f68c sse, mmx: fix clang-11 on POWER
233fef43 gfni: add many x86, ARM, z/Arch, PPC and WASM implementations

git-subtree-dir: lib/simde/simde
git-subtree-split: 02c7a67ed825018f9efdf2a7e4f39d8196f65337
@mr-c
Copy link
Contributor Author

mr-c commented May 5, 2023

Whoops, guess I'll have to make a 0.7.4.1 release of SIMDe! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant