You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried AVX512 pre-sieving using the 2 algorithms below.
On AMD EPYC 4th gen CPUs (Genoa) I saw no speedup using both GCC and Clang (compared to the default SSE2 pre-sieving algorithm). On Intel CPUs I masured 1% to 2% speedup using GCC (using ./primesieve 1e11 -t1) but no speedup using Clang. Overall I think that the added complexity is not worth it. Supporting AVX512 pre-sieving would likely require using GCC's multi-arch feature, which makes the code significantly more complex.
The AVX512 pre-sieving code is available on the avx512_presieve branch (note that code is for testing only, it is not production quality yet). It may be useful to retest this code in a few years, it is possible that on future x64 CPUs the AVX512 code will perform better.
I tried AVX512 pre-sieving using the 2 algorithms below.
On AMD EPYC 4th gen CPUs (Genoa) I saw no speedup using both GCC and Clang (compared to the default SSE2 pre-sieving algorithm). On Intel CPUs I masured 1% to 2% speedup using GCC (using
./primesieve 1e11 -t1
) but no speedup using Clang. Overall I think that the added complexity is not worth it. Supporting AVX512 pre-sieving would likely require using GCC's multi-arch feature, which makes the code significantly more complex.The AVX512 pre-sieving code is available on the avx512_presieve branch (note that code is for testing only, it is not production quality yet). It may be useful to retest this code in a few years, it is possible that on future x64 CPUs the AVX512 code will perform better.
Algorithm 1
Algorithm 2
The text was updated successfully, but these errors were encountered: