argon2-opencl fails on CPU and MIC #5417

solardiz · 2024-01-06T16:18:48Z

A known shortcoming/bug of the argon2-opencl format is that it fails self-test on CPU(-like) devices, as tested with ancient Intel OpenCL and AMD APP SDK that we have on our online dev boxes and with recent Intel OpenCL that @alainesp has on his laptop. We don't know exactly why - a guess is this has something to do with our usage of local memory.

The format works on most GPUs, the only exception identified so far being Intel HD Graphics, where it also fails.

The failures on CPUs and Intel GPU are FAILED (cmp_one(1)). The failure on MIC includes segfaults.

The text was updated successfully, but these errors were encountered:

solardiz · 2024-01-06T16:56:59Z

FWIW, the contents of out after the pre_processing kernel on Intel and AMD OpenCL on CPU match GPU's (so must be correct). On Intel HD Graphics, they don't match, so we seem to have/trigger a separate bug there.

So, not surprisingly, the main issue appears to be beyond pre-processing. This is consistent with this format already failing on CPUs before @alainesp moved the pre-processing from host to device.

solardiz · 2024-01-06T17:00:46Z

Overriding these didn't make a difference (still works on GPUs, fails on CPUs):

#define upsample(a, b) (((ulong)(a) << 32) | (b))
#define mul_hi(a, b) ((ulong)(a) * (b) >> 32)

solardiz · 2024-01-06T17:30:17Z

With the below hack and shmemSize forced to 32 KiB, it still works on a GPU, but still fails on CPUs like before:

-       uint warp   = (get_local_id(1) * get_local_size(0) + get_local_id(0)) / THREADS_PER_LANE;
+       uint warp   = (get_global_id(1) * get_global_size(0) + get_global_id(0)) / THREADS_PER_LANE;

So the issue is probably not specific to behavior of get_local_* on CPU.

alainesp · 2024-01-06T18:20:16Z

Maybe we should print a warning to the user when detecting CPU or Intel GPUs besides the self-test fail? Explain the situation a little more.

solardiz · 2024-04-06T18:20:11Z

In #5420, @magnumripper shows a macOS system where the format works for the first few test vectors on HD Graphics (edit: specifically, on Intel(R) UHD Graphics 630), only failing at FAILED (cmp_one(10)).

solardiz added the bug label Jan 6, 2024

solardiz mentioned this issue Jan 8, 2024

argon2-opencl fix for macOS #5420

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argon2-opencl fails on CPU and MIC #5417

argon2-opencl fails on CPU and MIC #5417

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

alainesp commented Jan 6, 2024 •

edited

solardiz commented Apr 6, 2024 •

edited

argon2-opencl fails on CPU and MIC #5417

argon2-opencl fails on CPU and MIC #5417

Comments

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

solardiz commented Jan 6, 2024

alainesp commented Jan 6, 2024 • edited

solardiz commented Apr 6, 2024 • edited

alainesp commented Jan 6, 2024 •

edited

solardiz commented Apr 6, 2024 •

edited