-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cuda::mr::cuda_async_memory_resource
#1637
base: main
Are you sure you want to change the base?
Conversation
6a353d8
to
f3b0145
Compare
This implements a wrapper around a `cudaMemPool_t`
8db9b27
to
1af8469
Compare
7812271
to
a782e09
Compare
c54d3e4
to
c86c926
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review -- just cuda_memory_pool
so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some typos.
libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/memory_resource/cuda_async_memory_resource/construct.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/memory_resource/cuda_async_memory_resource/construct.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/memory_resource/cuda_async_memory_resource/equality.pass.cpp
Outdated
Show resolved
Hide resolved
I don't really love the mixed ownership / lifetime semantics of this dual-class approach. In RMM we took a dual class approach but with each class having different ownership / lifetime semantics. One class that owns its pool. And another that is a view/reference to a non-owned pool resource. If you take the RMM approach of avoiding mixed ownership semantics, then your current Therefore, shouldn't we be able to have just one class that owns its pool? It could optionally take an argument to use the default cudaAsync pool instead of allocating one. But the non-owning version can just be replaced by a resource_ref? If we do go with two classes, we should try to avoid mixed ownership/lifetime semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments.
libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h
Outdated
Show resolved
Hide resolved
The issue is that
This would again bring conflicting ownership semantics. My first draft had this behavior and it turned out quite complex Maybe we should turn what is currently the |
I don't understand why the thing that owns
Where is the conflicting ownership? If you have a
That naming doesn't make sense to me, sorry. |
8c678e1
to
a5ea93a
Compare
cuda_memory_pool( | ||
const size_t initial_pool_size = 0, | ||
const size_t release_threshold = 0, | ||
const cudaMemAllocationHandleType __allocation_handle_type = cudaMemAllocationHandleType::cudaMemHandleTypeNone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that the constructor takes the device explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved to an API that explicitly takes the device id as a required argument.
I found that it is easy to incorrectly mix up the other arguments, so I encapsulated them into a helper struct.
@harrism does that look reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will indeed suppress clang-tidy's bugprone-easily-swappable-parameters
warning, which is nice. However I often struggle with the struct solution to that problem as I don't see how this helps avoid mixing up the two integral arguments:
cuda::mr::cuda_memory_pool_properties props = {42, 20};
At least it requires the caller to look at the struct and think about it, though. And I think the alternative of having strong types for the initial size and release threshold is overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is something where designated initializers shine
cuda::mr::cuda_memory_pool_properties props = { .initial_pool_size = 42, .release_threshold = 20};
#endif // no system header | ||
|
||
// cudaMallocAsync was introduced in CTK 11.2 | ||
#if !defined(_CCCL_COMPILER_MSVC_2017) && !defined(_CCCL_CUDACC_BELOW_11_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gevtushenko We should move this to cudaNext to ensure that we get the design right
libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h
Outdated
Show resolved
Hide resolved
cuda_memory_pool(::cudaMemPool_t) = delete; | ||
|
||
cuda_memory_pool(cuda_memory_pool const&) = delete; | ||
cuda_memory_pool(cuda_memory_pool&&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this solely because you want to avoid empty states in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This should always contain a valid cudaMemPool_t
5b0fc7e
to
a94230f
Compare
🟩 CI Results [ Failed: 0 | Passed: 302 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
This implements two facilities for allocating stream order device accessible memory.
The first commit adds
cuda_memory_pool
, a wrapper around::cudaMemPool_t
that takes care of initialization of the buffer and its destruction.The second commit adds
cuda_async_memory_resource
, is a leightweight async_resource. It utilizes either a self-ownedcuda_memory_pool
that is properly destroyed when it goes out of scope, or wraps a user provided::cudaMemPool_t
.This allows interoperability between different libraries by retaining an existing pool
Fixes #1514