`alpaka::getWarpSizes` incurs a noticeable overhead #2192

fwyzard · 2023-11-21T09:41:49Z

While porting the CMS pixel reconstruction from native CUDA to Alpaka, it was noticed that the use of the alpaka::getWarpSizes(device) function incurs a noticeable overhead.

See cms-sw/cmssw#43064 (comment) for the discussion.

A possible workaround is to cache the warp size in our code, instead of querying it for every event.

However, it would seem natural to cache this information within the Alpaka device objects, instead of querying the underlying back-end each time.

The text was updated successfully, but these errors were encountered:

fwyzard · 2023-11-21T09:52:33Z

I think that caching the warp sizes inside the device object would require

either filling it at construction time
or using a mutex to avoid setting the cache concurrently

psychocoderHPC · 2023-11-21T12:14:45Z

IMO caching makes sense, we should store the value during the device creation then there will be no need for a mutex.

bernhardmgruber · 2023-11-21T18:31:22Z

Is there a CUDA device with a warpSize not 32? I am almost in favor of hardcoding it ... Otherwise, we could just collect and cache the entire device properties (i.e. cudaDeviceProp), so we can also serve other values faster.

fwyzard · 2023-11-21T20:51:09Z

Not that I know of.

But HIP devices can have a warp size of 32 or 64, depending on the GPU model and potentially on the environment settings.

psychocoderHPC · 2024-03-12T09:00:22Z

Partly solved by #2246. Never the less we should cache all over runtime constant device properties within the device, than there is no need to query the API multiple times.

fwyzard added Type:Enhancement Backend:CUDA Backend:SYCL Backend:HIP labels Nov 21, 2023

psychocoderHPC mentioned this issue Dec 19, 2023

AccDevProps should include global memory available #2194

Closed

mehmetyusufoglu mentioned this issue Mar 6, 2024

Fix slow getWarpSize problem #2246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`alpaka::getWarpSizes` incurs a noticeable overhead #2192

`alpaka::getWarpSizes` incurs a noticeable overhead #2192

fwyzard commented Nov 21, 2023

fwyzard commented Nov 21, 2023

psychocoderHPC commented Nov 21, 2023

bernhardmgruber commented Nov 21, 2023

fwyzard commented Nov 21, 2023

psychocoderHPC commented Mar 12, 2024

alpaka::getWarpSizes incurs a noticeable overhead #2192

alpaka::getWarpSizes incurs a noticeable overhead #2192

Comments

fwyzard commented Nov 21, 2023

fwyzard commented Nov 21, 2023

psychocoderHPC commented Nov 21, 2023

bernhardmgruber commented Nov 21, 2023

fwyzard commented Nov 21, 2023

psychocoderHPC commented Mar 12, 2024

`alpaka::getWarpSizes` incurs a noticeable overhead #2192

`alpaka::getWarpSizes` incurs a noticeable overhead #2192