Implement `cuda::mr::cuda_async_memory_resource` #1637

miscco · 2024-04-16T08:33:35Z

This implements two facilities for allocating stream order device accessible memory.

The first commit adds cuda_memory_pool, a wrapper around ::cudaMemPool_t that takes care of initialization of the buffer and its destruction.

The second commit adds cuda_async_memory_resource, is a leightweight async_resource. It utilizes either a self-owned cuda_memory_pool that is properly destroyed when it goes out of scope, or wraps a user provided ::cudaMemPool_t.

This allows interoperability between different libraries by retaining an existing pool

Fixes #1514

This implements a wrapper around a `cudaMemPool_t`

Fixes NVIDIA#1514

…pilation

harrism

Partial review -- just cuda_memory_pool so far.

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h

harrism

Just some typos.

libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h

libcudacxx/test/libcudacxx/cuda/memory_resource/cuda_async_memory_resource/construct.pass.cpp

libcudacxx/test/libcudacxx/cuda/memory_resource/cuda_async_memory_resource/equality.pass.cpp

harrism · 2024-04-23T23:17:18Z

I don't really love the mixed ownership / lifetime semantics of this dual-class approach. In RMM we took a dual class approach but with each class having different ownership / lifetime semantics. One class that owns its pool. And another that is a view/reference to a non-owned pool resource. If you take the RMM approach of avoiding mixed ownership semantics, then your current cuda_async_memory_resource, since it is such a thin wrapper and only decides whether to use the "raw" pool or not, becomes basically just a resource_ref.

Therefore, shouldn't we be able to have just one class that owns its pool? It could optionally take an argument to use the default cudaAsync pool instead of allocating one. But the non-owning version can just be replaced by a resource_ref?

If we do go with two classes, we should try to avoid mixed ownership/lifetime semantics.

harrism

More comments.

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h

libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h

miscco · 2024-04-24T06:47:57Z

If you take the RMM approach of avoiding mixed ownership semantics, then your current cuda_async_memory_resource, since it is such a thin wrapper and only decides whether to use the "raw" pool or not, becomes basically just a resource_ref.

The issue is that cudaMemPool_t itself is not a resource. So we will in any way need a wrapper around it so that it satisfies the resource concept.

Therefore, shouldn't we be able to have just one class that owns its pool? It could optionally take an argument to use the default cudaAsync pool instead of allocating one. But the non-owning version can just be replaced by a resource_ref?

This would again bring conflicting ownership semantics. My first draft had this behavior and it turned out quite complex

Maybe we should turn what is currently the cuda_memory_pool into cuda_async_memory_resource and then rename the wrapper into something like cuda_mem_pool_resource ?

harrism · 2024-04-24T08:49:31Z

If you take the RMM approach of avoiding mixed ownership semantics, then your current cuda_async_memory_resource, since it is such a thin wrapper and only decides whether to use the "raw" pool or not, becomes basically just a resource_ref.

The issue is that cudaMemPool_t itself is not a resource. So we will in any way need a wrapper around it so that it satisfies the resource concept.

I don't understand why the thing that owns cudaMempool_t has to satisfy the resource concept. A mempool in the CUDA API doesn't itself allocate memory, it's just a handle. IMO there should be exactly one place where we call cudaMallocAsync and one place where we call cudaFreeAsync (both in the same class).

Therefore, shouldn't we be able to have just one class that owns its pool? It could optionally take an argument to use the default cudaAsync pool instead of allocating one. But the non-owning version can just be replaced by a resource_ref?

This would again bring conflicting ownership semantics. My first draft had this behavior and it turned out quite complex

Where is the conflicting ownership? If you have a cuda_memory_pool that is not a resource but owns a pool, and a cuda_async_memory_resource that takes a cuda_memory_pool (or uses the default) and doesn't own it, I see no ownership conflict. (This is slightly different from what I said above.)

Maybe we should turn what is currently the cuda_memory_pool into cuda_async_memory_resource and then rename the wrapper into something like cuda_mem_pool_resource ?

That naming doesn't make sense to me, sorry.

miscco · 2024-04-29T17:45:29Z

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h

+  cuda_memory_pool(
+    const size_t initial_pool_size                             = 0,
+    const size_t release_threshold                             = 0,
+    const cudaMemAllocationHandleType __allocation_handle_type = cudaMemAllocationHandleType::cudaMemHandleTypeNone)


We should ensure that the constructor takes the device explicitly

I moved to an API that explicitly takes the device id as a required argument.

I found that it is easy to incorrectly mix up the other arguments, so I encapsulated them into a helper struct.

@harrism does that look reasonable to you?

This will indeed suppress clang-tidy's bugprone-easily-swappable-parameters warning, which is nice. However I often struggle with the struct solution to that problem as I don't see how this helps avoid mixing up the two integral arguments:

cuda::mr::cuda_memory_pool_properties props = {42, 20};

At least it requires the caller to look at the struct and think about it, though. And I think the alternative of having strong types for the initial size and release threshold is overkill.

I believe this is something where designated initializers shine

cuda::mr::cuda_memory_pool_properties props = { .initial_pool_size = 42, .release_threshold = 20};

miscco · 2024-04-29T17:47:49Z

libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h

+#endif // no system header
+
+// cudaMallocAsync was introduced in CTK 11.2
+#if !defined(_CCCL_COMPILER_MSVC_2017) && !defined(_CCCL_CUDACC_BELOW_11_2)


@gevtushenko We should move this to cudaNext to ensure that we get the design right

libcudacxx/include/cuda/__memory_resource/cuda_async_memory_resource.h

griwes · 2024-05-06T18:53:29Z

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h

+  cuda_memory_pool(::cudaMemPool_t) = delete;
+
+  cuda_memory_pool(cuda_memory_pool const&)            = delete;
+  cuda_memory_pool(cuda_memory_pool&&)                 = delete;


Is this solely because you want to avoid empty states in this class?

Yes. This should always contain a valid cudaMemPool_t

github-actions · 2024-05-07T18:35:39Z

🟩 CI Results [ Failed: 0 | Passed: 302 | Total: 302 ]

🟩 Project cub [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 cpu
  🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
  🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
🟩 ctk
  🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
🟩 cudacxx_full
  🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
🟩 cudacxx_name
  🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
🟩 cxx_full
  🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
  🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
  🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
  🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
🟩 cxx_name
  🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
  🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
  🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
🟩 gpu
  🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
🟩 jobs
  🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
  🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
🟩 os
  🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
  🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
  🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
🟩 sm
  🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
🟩 std
  🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
  🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
  🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
  🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16

🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 cpu
  🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
  🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
🟩 ctk
  🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
🟩 cudacxx_full
  🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
🟩 cudacxx_name
  🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
🟩 cxx_full
  🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
  🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
  🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
  🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
🟩 cxx_name
  🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
  🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
  🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
🟩 gpu
  🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
🟩 jobs
  🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
  🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
🟩 os
  🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
  🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
  🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
🟩 sm
  🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
🟩 std
  🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
  🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
  🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
  🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16

🟩 Project libcudacxx [ Failed: 0 | Passed: 104 | Total: 104 ]

🟩 cpu
  🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 96  -- Total: 96 
  🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
🟩 ctk
  🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 86  -- Total: 86 
🟩 cudacxx_full
  🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
  🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 84  -- Total: 84 
🟩 cudacxx_name
  🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 102 -- Total: 102
🟩 cxx_full
  🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
  🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
  🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 21  -- Total: 21 
  🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
  🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
  🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
🟩 cxx_name
  🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
  🟩 gcc (0% Fail)                Failed:  0  -- Passed: 52  -- Total: 52 
  🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
🟩 gpu
  🟩 v100 (0% Fail)               Failed:  0  -- Passed: 104 -- Total: 104
🟩 jobs
  🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
  🟩 nvrtc (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
  🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
  🟩 verify_codegen (0% Fail)     Failed:  0  -- Passed:  1  -- Total:  1 
🟩 os
  🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
  🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
  🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 49  -- Total: 49 
  🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
🟩 sm
  🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
  🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
🟩 std
  🟩 11 (0% Fail)                 Failed:  0  -- Passed: 27  -- Total: 27 
  🟩 14 (0% Fail)                 Failed:  0  -- Passed: 30  -- Total: 30 
  🟩 17 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
  🟩 20 (0% Fail)                 Failed:  0  -- Passed: 17  -- Total: 17

🏃‍ Runner counts (total jobs: 302)

#	Runner
232	`linux-amd64-cpu16`
28	`linux-amd64-gpu-v100-latest-1`
24	`linux-arm64-cpu16`
18	`windows-amd64-cpu16`

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust

miscco requested review from a team as code owners April 16, 2024 08:33

miscco requested review from ericniebler and alliepiper April 16, 2024 08:33

miscco force-pushed the cuda_async_memory_resource branch 4 times, most recently from 6a353d8 to f3b0145 Compare April 16, 2024 15:37

miscco added 3 commits April 18, 2024 08:33

Implement cuda::mr::cuda_memory_pool

babf1b6

This implements a wrapper around a `cudaMemPool_t`

Implement cuda::mr::cuda_async_memory_resource

244d73e

Fixes NVIDIA#1514

Make cuda_async_memory_resource a pure wrapper around a memory pool

1af8469

miscco force-pushed the cuda_async_memory_resource branch from 8db9b27 to 1af8469 Compare April 18, 2024 06:33

miscco added 2 commits April 19, 2024 10:46

Merge branch 'main' into cuda_async_memory_resource

82b57c5

Do not use a reference inside a union

a782e09

miscco force-pushed the cuda_async_memory_resource branch from 7812271 to a782e09 Compare April 19, 2024 08:58

miscco added 4 commits April 19, 2024 11:38

Avoid unsued argument warning

691ecbd

include the right header

b7b240a

When defining _CCCL_CUDACC_BELOW_ we need to consider host only com…

ce2f784

…pilation

Fix invalid _CCCL_CUDACC definition for clang cuda

c86c926

miscco force-pushed the cuda_async_memory_resource branch from c54d3e4 to c86c926 Compare April 22, 2024 06:49

harrism reviewed Apr 23, 2024

View reviewed changes

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h Outdated Show resolved Hide resolved

libcudacxx/include/cuda/__memory_resource/cuda_memory_pool.h Show resolved Hide resolved

harrism suggested changes Apr 23, 2024

View reviewed changes

Fix typos

a2b2956

harrism suggested changes Apr 23, 2024

View reviewed changes

miscco added 3 commits April 24, 2024 09:44

Move common facilities to api_wrappers.h

5a918cf

Only ever store a cudaMemPool_t in cuda_async_memory_resource

f1b0228

Add reference to cudaMallocFromPoolAsync to the docs

2050071

miscco added 2 commits April 25, 2024 09:56

Make cuda_memory_pool a thin wrapper around a cudaMemPool_t

9d9f85e

Fix include issues

a5ea93a

miscco force-pushed the cuda_async_memory_resource branch from 8c678e1 to a5ea93a Compare April 25, 2024 11:36

miscco commented Apr 29, 2024

View reviewed changes

griwes reviewed May 6, 2024

View reviewed changes

Address review comments

a94230f

miscco force-pushed the cuda_async_memory_resource branch from 5b0fc7e to a94230f Compare May 7, 2024 11:00

miscco added 3 commits May 7, 2024 13:13

Use a property struct to avoid confusion about the memory pool arguments

ea66197

Merge branch 'main' into cuda_async_memory_resource

7a99067

Ensure that we actually test with the right flag

86af417

miscco requested a review from a team as a code owner May 7, 2024 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `cuda::mr::cuda_async_memory_resource` #1637

Implement `cuda::mr::cuda_async_memory_resource` #1637

miscco commented Apr 16, 2024

harrism left a comment

harrism left a comment

harrism commented Apr 23, 2024

harrism left a comment

miscco commented Apr 24, 2024 •

edited

harrism commented Apr 24, 2024 •

edited

miscco Apr 29, 2024

miscco May 7, 2024

harrism May 8, 2024

miscco May 8, 2024

miscco Apr 29, 2024

griwes May 6, 2024

miscco May 7, 2024

github-actions bot commented May 7, 2024

🟩 Project cub [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 Project libcudacxx [ Failed: 0 | Passed: 104 | Total: 104 ]

🏃‍ Runner counts (total jobs: 302)

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

Implement cuda::mr::cuda_async_memory_resource #1637

Are you sure you want to change the base?

Implement cuda::mr::cuda_async_memory_resource #1637

Conversation

miscco commented Apr 16, 2024

harrism left a comment

Choose a reason for hiding this comment

harrism left a comment

Choose a reason for hiding this comment

harrism commented Apr 23, 2024

harrism left a comment

Choose a reason for hiding this comment

miscco commented Apr 24, 2024 • edited

harrism commented Apr 24, 2024 • edited

miscco Apr 29, 2024

Choose a reason for hiding this comment

miscco May 7, 2024

Choose a reason for hiding this comment

harrism May 8, 2024

Choose a reason for hiding this comment

miscco May 8, 2024

Choose a reason for hiding this comment

miscco Apr 29, 2024

Choose a reason for hiding this comment

griwes May 6, 2024

Choose a reason for hiding this comment

miscco May 7, 2024

Choose a reason for hiding this comment

github-actions bot commented May 7, 2024

🟩 Project cub [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

🟩 Project libcudacxx [ Failed: 0 | Passed: 104 | Total: 104 ]

🏃‍ Runner counts (total jobs: 302)

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

Implement `cuda::mr::cuda_async_memory_resource` #1637

Implement `cuda::mr::cuda_async_memory_resource` #1637

miscco commented Apr 24, 2024 •

edited

harrism commented Apr 24, 2024 •

edited