-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dimensions description functionality to CUDA Experimental library #1743
base: main
Are you sure you want to change the base?
Conversation
There are several changes from the previous iteration: dims and flatten was replaced with extents, which should fit well to what the function does and to mdspan::extents(). Some initial inline rst documentation was added, but its not being build yet. Header guards were adjusted to the new library name. #undef NDEBUG was added to testing_common.h to properly enable assertions in device side testing.
Disable c++17 windows builds because mdspan is not supported there
Also adds a comment describing a new test of rst docs examples
pre-commit.ci autofix |
/ok to test |
🟨 CI Results [ Failed: 35 | Passed: 328 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
/ok to test |
🟩 CI Results [ Failed: 0 | Passed: 363 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
} | ||
|
||
template <typename TyTrunc, typename Index, typename Dims> | ||
__device__ constexpr auto index_to_linear(const Index& index, const Dims& dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be either _CCCL_DEVICE
or probably _CCCL_HOST_DEVICE
__device__ constexpr auto index_to_linear(const Index& index, const Dims& dims) | |
_CCCL_HOST_DEVICE constexpr auto index_to_linear(const Index& index, const Dims& dims) |
::cuda::std::dynamic_extent>(dims.x, dims.y, dims.z); | ||
} | ||
|
||
template <typename TyTrunc, typename Index, typename Dims> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TyTrunc
template argument is not used here
const T y = Dims::rank() > 1 ? Dims::extent(1) : 1; | ||
const T z = Dims::rank() > 2 ? Dims::extent(2) : 1; | ||
|
||
constexpr _CCCL_HOST_DEVICE operator dim3() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr _CCCL_HOST_DEVICE operator dim3() const | |
_CCCL_HOST_DEVICE constexpr operator dim3() const |
namespace detail | ||
{ | ||
template <typename OpType> | ||
_CCCL_HOST_DEVICE constexpr size_t merge_extents(size_t e1, size_t e2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question. Should this be:
_CCCL_HOST_DEVICE constexpr size_t merge_extents(size_t e1, size_t e2) | |
_CCCL_NODISCARD _CCCL_HOST_DEVICE constexpr size_t merge_extents(size_t e1, size_t e2) |
Applies throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added _CCCL_NODISCARD to all functions in detail namespace, but I had an issue where on a few functions compiler complained about it being applied to a function returning void. There is no way these functions return void, so I just left the annotation commented out for now
|
||
#include <cuda/std/mdspan> | ||
|
||
namespace cuda::experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guard against standard mode here.
Looking at the code it is at least C++14, if not C++17.
We should add
namespace cuda::experimental | |
#if _CCCL_STD_VER >= 2014 | |
namespace cuda::experimental |
Applies throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We target c++17, added the guards
static_assert(::cuda::std::is_base_of_v<hierarchy_level, BottomUnit> || ::cuda::std::is_same_v<BottomUnit, void>); | ||
::cuda::std::tuple<Levels...> levels; | ||
|
||
constexpr _CCCL_HOST_DEVICE hierarchy_dimensions_fragment(const Levels&... ls) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes come first
constexpr _CCCL_HOST_DEVICE hierarchy_dimensions_fragment(const Levels&... ls) noexcept | |
_CCCL_HOST_DEVICE constexpr hierarchy_dimensions_fragment(const Levels&... ls) noexcept |
[](const auto&... levels) { | ||
return hierarchy_dimensions_fragment<Unit, ::cuda::std::remove_reference_t<decltype(levels)>...>(levels...); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely turn this into a function object, otherwise this will instantiate a full new type every time we instantiate this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we still need to template that object on Unit, since its not present in selected, but at least its no longer templated on Level (unless I misunderstood the ask).
Moved it to a helper function object.
using extents_type = decltype(::cuda::std::declval<hierarchy_dimensions_fragment<BottomUnit, Levels...>>() | ||
.template extents<Unit, Level>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be an alias within hierarchy_dimensions_fragment
rather than the decltype of a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, moved it to a member alias
} | ||
|
||
} // namespace cuda::experimental | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment on #endif
namespace hierarchy | ||
{ | ||
template <typename Unit, typename Level> | ||
auto __device__ rank(const Unit& = Unit(), const Level& = Level()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to avoid plain __device__
should be _CCCL_DEVICE
Cleaned up order of annotations and constexpr, removed __device__. Moved to absolute includes. Added no discard on all functions in detail namespace Added c++17 ifdef. Changed header guards to a new format applicable after I move some files in a future change. A couple of _LIBCUDACXX_UNREACHABLE and other fixes
and the main header to hierarchy.cuh
/ok to test |
🟨 CI Results [ Failed: 36 | Passed: 327 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
/ok to test |
🟨 CI Results [ Failed: 32 | Passed: 331 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
/ok to test |
🟩 CI Results [ Failed: 0 | Passed: 363 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
/ok to test |
🟩 CI Results [ Failed: 0 | Passed: 363 | Total: 363 ]
|
# | Runner |
---|---|
275 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
20 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
This pull request adds hierarchy_dimensions type template that allows to describe a hierarchy of CUDA threads with a mix of static and dynamic information. It can be used to pass into kernels and then calculate aggregates at compile time (like count threads in each CUDA block to create a statically sized array). It can also be used to make libraries aware of the shape of currently running grid and optimize some thread id calculation with compile time values.
hierarchy_dimensions type template is basically a tuple of level_dimensions entries that describe levels. Each level consist of two things, first one is a type to describe what that level it is, for example block_level, cluster_level, etc. Second one is cuda::std::extents object to describe dimensions of that level with both static and dynamic values, the same way it describes cuda::std::mdspan objects.
This is an initial implementation, there is a number of TODOs, names are not final and the interface can change depending on the feedback received.
This type is is also a building block for other libraries and features that are in the pipeline.
This functionality was initially a part of the PR that added CUDA Experimental, but was separated for easier review.
Compared to the previous pull request:
NVBUG 4541889