Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memcheck and CEED_STRIDES_BACKEND #1460

Closed
jedbrown opened this issue Feb 2, 2024 · 9 comments
Closed

memcheck and CEED_STRIDES_BACKEND #1460

jedbrown opened this issue Feb 2, 2024 · 9 comments
Labels

Comments

@jedbrown
Copy link
Member

jedbrown commented Feb 2, 2024

From @knepley

If you call CeedElemRestrictionCreateStrided() with CEED_STRIDES_BACKEND, and you stick in values on the CPU, you can get code that works fine with the CPU device and fails with the GPU device, and it is very hard to see why that is happening.

A couple ideas to catch this sooner:

  • We could make memcheck use a weird permutation for its backend so you can at least detect this on the host.
  • We could forbid access to vectors associated with CEED_STRIDES_BACKEND. I think this would require extra state. It should still be possible to compute permutation-invariant things like norms.
@jeremylt
Copy link
Member

jeremylt commented Feb 2, 2024

Really, the issue is that CEED_STRIDES_BACKEND was specifically intended to only be valid when you keep using the same backend. Everything works fine if you use the exact same backend throughout. I thought the documentation called this out.

@ref CEED_STRIDES_BACKEND may be used for `CeedVector` ordered by the same `Ceed` backend.

That isn't the best explanation though. We should make it clearer (though people don't always read the documentation carefully to catch all implications of what's in there).

Technically, the L-vector ordering is supposed to be consistent across backends when using CEED_STRIDES_BACKEND, so it just matters when you try to get at the raw E-vector ordering, which isn't something users are supposed to be doing.

Making the memcheck backend use a bizarre permutation would also mean we need a brand new memcheck CeedOperator implementation, so I'd recommend against that.

@knepley
Copy link
Collaborator

knepley commented Feb 2, 2024

Technically, the L-vector ordering is supposed to be consistent across backends when using CEED_STRIDES_BACKEND, so it just matters when you try to get at the raw E-vector ordering, which isn't something users are supposed to be doing.

What I did is pull out the array as CEED_MEMORY_HOST and fill it up (with geometric data), and then use that as a kernel input. On the CPU it worked fine, but on the GPU it was reordered. When I used explicit strides, everything worked normally.

@jeremylt
Copy link
Member

jeremylt commented Feb 2, 2024

Woops, misremembered and mispoke above. The strides specify the L-vector ordering, so CEED_STRIDES_BACKEND says you are using the specific ordering only designed to work for the backend that created the CeedElemRestriction. So the only valid usage of CEED_STRIDES_BACKEND is for a CeedOperator filling an output vector which is an input vector for a different CeedOperator from the exact same backend.

Taking an ordering from one backend and handing it off to a different backend is specifically a usage that CEED_STRIDES_BACKEND was intended not to support .

I don't think taking the pointer out of the CeedVector and using it is something that's easy for libCEED to warn about.

But, I think now that actually changing the meaning of CEED_STRIDES_BACKEND for the memcheck backend will get that backend yell at you appropriately in the cases we can warn about.

@jedbrown
Copy link
Member Author

jedbrown commented Feb 2, 2024

I think we could forbid host access to such a vector (using an extra flag). Most vectors transparently mirror, but we could say it's invalid to do that on these L-vectors. We could possibly do something like that on the host too, but that would require that backend access use an internal interface to bypass that check.

@jeremylt
Copy link
Member

jeremylt commented Feb 2, 2024

Ok, I at least have a way to make the problem seen, though it doesn't provide an explanation as to what is happening.

#1463

It would be handy for #1454 to have some sort of internal API for bypassing some checks when accessing vector memory spaces. I haven't worked through exactly what I need, but some deeper replication of the dual memory spaces we have for device backends would have caught the small vector issues we had in the deal.II example when first testing on HIP.

@sebastiangrimberg
Copy link
Collaborator

Technically, the L-vector ordering is supposed to be consistent across backends when using CEED_STRIDES_BACKEND, so it just matters when you try to get at the raw E-vector ordering, which isn't something users are supposed to be doing.

What I did is pull out the array as CEED_MEMORY_HOST and fill it up (with geometric data), and then use that as a kernel input. On the CPU it worked fine, but on the GPU it was reordered. When I used explicit strides, everything worked normally.

I’ve done a similar thing, but used CeedElemRestrictionGetELayout on the host when indexing into the quadrature point data E-vector. It seems to work both on CPU and GPU backends, but is this not a supported way of doing things? The use case is if you have most of your operator quadrature data which is computable from inside a QFunction, but there is an extra component which is more easily populated just with a simple loop without going through the libCEED operator machinery.

@jeremylt
Copy link
Member

jeremylt commented Feb 3, 2024

CEED_STRIDES_BACKEND means that the backend picks the L-vector ordering. It doesn't guarantee that the L-vector is ordered identical to the E-vector. It's really intended for transferring data between CeedOperater of the same backend.

The guaranteed approach under the API as written would be to apply the CeedElemRestriction and then go through the E-vector. We could also add a CeedElemRestrictionGetLLayout that would provide the actual backend L-vector layout.

@jeremylt
Copy link
Member

jeremylt commented Feb 5, 2024

Ok, I think I like my PR better now. Summary of the changes

  1. Expanded note in the documentation about CEED_STRIDES_BACKEND

  2. Set CEED_STRIDES_BACKEND to be different for memcheck than the default layouts for both CPU and GPU

  3. Add CeedElemRestrictionGetLLayout for strided restrictions

  4. Fix CeedElemRestrictionGet[L,E]Layout signature, previously there ways an unneeded pointer

#1463

I think this helps the memcheck backend catch some cases of misuse and offers a new function to help users muck around with the contents of the L-vector themselves.

@jeremylt
Copy link
Member

Ok, I've merged fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants