Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about hardware memory alignments #1016

Open
TriceHelix opened this issue Jun 14, 2023 · 3 comments
Open

Question about hardware memory alignments #1016

TriceHelix opened this issue Jun 14, 2023 · 3 comments
Labels
feature A new feature (or feature request) question
Milestone

Comments

@TriceHelix
Copy link
Contributor

TriceHelix commented Jun 14, 2023

Dear ILGPU Devs & Community,

From my understanding, when allocating memory the hardware will return a pointer aligned to a specific number of bytes for technical and performance reasons. According to this, that number is 256 for CUDA GPUs. I imagine that other types of hardware will adhere to different alignments.

Does ILGPU provide a hardware independent way to query this information?

The reason I am wondering about this is because I'm trying to reduce the number of memory allocations I make (in order to avoid accumulating API overhead). Instead of multiple buffers, I allocate a single one and split it into multiple sections for arrays and variables. According to the guide I referenced, these sections should be aligned the same way the original allocation was, otherwise things may break (at least for CUDA devices).
Quote:

Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes.

Reading non-naturally aligned 8-byte or 16-byte words produces incorrect results (off by a few words), so special care must be taken to maintain alignment of the starting address of any value or array of values of these types. A typical case where this might be easily overlooked is when using some custom global memory allocation scheme, whereby the allocations of multiple arrays (with multiple calls to cudaMalloc() or cuMemAlloc()) is replaced by the allocation of a single large block of memory partitioned into multiple arrays, in which case the starting address of each array is offset from the block’s starting address.

@m4rs-mt
Copy link
Owner

m4rs-mt commented Jun 14, 2023

Hi @TriceHelix, thanks for reaching out to us regarding this topic. Currently, ILGPU is internally aware of the global pointer alignment (e.g., here) and uses this information to automatically vectorize IO ops, for instance. Based on the backend and the accelerator (... and your kernel of course) ILGPU will do "everything for you" when using the O2 optimization pipeline. However, you can also control the alignment of views inside kernels via view.AlignToXYZ() and view.AlignTo(xyz) (where xyz evaluates to a constant). Moreover, you can "tell" ILGPU that a particular view as a certain alignment via view.AsAlignedXYZ() and view.AsAligned(xyz) (where xyz evaluates to a constant at kernel compile time).

To answer your actual question, there is no such API to query the automatic alignment rules at the moment. However, I think this makes absolutely sense to add. @MoFtZ what do you think?

@m4rs-mt m4rs-mt added feature A new feature (or feature request) question labels Jun 14, 2023
@m4rs-mt m4rs-mt added this to the v2.0 milestone Jun 15, 2023
@TriceHelix

This comment was marked as resolved.

@TriceHelix
Copy link
Contributor Author

Turns out most of the issues I was facing were due to the Cast method messing with aligned memory, not the actual alignments themselves. I also realized ArrayViews are not eqivalent to pointers, as they all refer to a buffer at a certain offset under the hood. As you've said, all of that alignment for the buffer is done automatically. However, I'd still say that an API to make the alignment amounts transparent would be neat. For now, consider the other issue the actual problem I've encountered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature (or feature request) question
Projects
None yet
Development

No branches or pull requests

2 participants