-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional requirements API #1326
base: main
Are you sure you want to change the base?
Conversation
cub/docs/developer_overview.rst
Outdated
|
||
template <class IteratorT, class ReguirementsT> | ||
static typename ::cuda::std::enable_if< | ||
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1) | |
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, RequirementsT>::value, // (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: How does this approach interact with non-static requirements, e.g. memory footprint?
cub/docs/developer_overview.rst
Outdated
{ | ||
auto guarantees = cub::detail::requirements::mask(default_guarantees_t(), requirements); // (3) | ||
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4) | ||
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing something that came up in the meeting -- it would be nicer for readability to have a query that doesn't spell out a specific guarantee, but rather the category. Maybe something like
enum determinism_t
{
category, // Always the weakest
nondeterministic,
run_to_run_deterministic
};
// ...
namespace category {
// Never exposed outside of detail
using determinism = determinism_holder_t<determinism_t::category>;
}
// ...
auto determinism = cub::detail::requirements::get<cub::detail::category::determinism>(guarantees);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this. I like the direction but have concerns about some implementation details.
- I am concerned about binary size implications. Passing the as variadic arguments has the potential to blow up. Also it is quite easy to accidentally pass multiple requirements instead of one.
- I find the metaprogramming overly complex. We want to unsure that only duplicates are within the requirements list. The trick with
operator<
is overly complex for that - In general this has a lot of similarity to the property based system we have prototyped for
<memory_resource>
. - I believe we should have a way to query the requirement categories for every algorithm and those should be unique for each algorithm.
Putting my thoughts together I believe we should define a algorithm_requirement
for each algorithm and let it handle all the requirements.
I have shortly hacked something along those lines here https://godbolt.org/z/47YcaGx7M
Note that this currently rejects duplicated entries, but this is something we could easily add.
cub/cub/detail/meta.cuh
Outdated
{}; | ||
|
||
template <class T, class U> | ||
struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type> | |
struct statically_ordered_t<T, U, ::cuda::std::__enable_if_t<true_t<T{} < U{} || U{} < T{}>::value>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I am a bit confused about the ||
here
cub/cub/detail/meta.cuh
Outdated
struct statically_equal_t | ||
: ::cuda::std::integral_constant< | ||
bool, | ||
statically_ordered_t<T, U>::value && !statically_less_t<T, U>::value && !statically_less_t<U, T>::value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not covering the case where T{} == U{}
which seems important
cub/docs/developer_overview.rst
Outdated
detail::max_memory_footprint_t>; | ||
|
||
template <class IteratorT, class ReguirementsT = default_guarantees_t> | ||
static auto sum(IteratorT begin, IteratorT end, ReguirementsT requirements = default_guarantees_t()) // (2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: This interface is insufficiently constrained. We should ensure that we constrain the passed requirement to the right type. Otherwise this will give bas error messages to a user that passes something wrong
cub/docs/developer_overview.rst
Outdated
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4) | ||
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imporntant: We should align properly. Also we are inconsistent with the use of the namespaces
}; | ||
|
||
template <determinism_t L, determinism_t R> | ||
_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: We should not use internal macros in public code sampes
_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>) | |
__host__ __device__ constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>) |
cub/docs/developer_overview.rst
Outdated
auto guarantees = cub::detail::requirements::mask(default_guarantees_t(), requirements); // (3) | ||
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4) | ||
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5) | ||
return reduce::sum(begin, end, determinism, max_footprint); // (6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I am concerned about binary size here. What if a user calls this directly with switched guarantees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Requirements doesn't go deep into CUB. The worst thing that can happen is instantiating a few thin dispatch functions.
|
||
class scan | ||
{ | ||
using weak_guarantees_t = cub::detail::guarantees::guarantees_t<cub::detail::guarantees::determinism_not_guaranteed_t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: I am missing a way to query the potential requirement categories of the algorithm.
@pciolkosz suggested to extend the requirements API with guarantees API. The idea is that sometimes users require certain properties (already implemented in this PR), but sometimes, they provide guarantees. For instance, user might guarantee convergence of threads which would allow us to dispatch to more optimal implementation. We should consider this, since it's also useful in CUB. Apart from that, we should consider exposing API in |
After thinking about this a while, it occurs to me that guarantees are effectively the same as properties we developed for The way I currently think about it:
@gevtushenko do you think the current properties machinery could be used for the guarantees (but not requirements)? |
Description
closes #1187
This is a prerequisite PR for deterministic scan. This PR provides an implementation and a design alternative for functional requirements API. Some of the functional requirements to the design below:
Design Requirements
Generic
The requirements API should not be limited to determinism guarantees. There are use cases where clients couldn't use scan because it's not work efficient (invokes operator more times than needed). Alternatively, one could imagine a functional requirement on the maximal temporary storage size. The design of functional requirements API should allow alternative functional requirements going forward.
Algorithm Specific
There might be algorithm-specific functional guarantees. For instance, in the for-each one could relax the requirement of not being able to copy elements or alternatively, the user can explicitly require that the function object is invoked on the original value. Alternatively, we scan might provide a guarantee on storing partial aggregates.
Library-Wide Requirements
Opposite to the previous requirement, we should provide library-wide requirements. It won't be a great experience if every algorithm provide it's own version of determinism requirements. This can get inconsistent quickly. Apart from that, when we start working on determinism API exposure in Thrust, it'll be mush easier to attach "run-to-run deterministic" tag to execution policy as opposed to attaching such a tag to each algorithm.
Type Safe
If given functional requirement can't be satisfied, the most sound approach would be to fail at compile time. If we introduce the functional requirement API for some algorithm, but then forget to handle, say, determinism requirements, it'll be a critical issue on the users end. I incline towards emitting a compile-time error if we see function requirement that's not known to a given algorithm.
Visible Defaults
The default deterministic API "parameter" should be defined on the algorithm basis. Apart from different algorithms exposing different guarantees (scan vs reduce), this would provide some visibility into algorithm behavior without looking deep into the implementation.
Runtime
The API should allow passing runtime parameters. Say you have N free bytes in memory and you request a given algorithm (reduce by key) to fit into this memory. This is a functional requirement which requires runtime parameter.
Design Alternative
Details on the design are given in the developer overview section of CUB docs. I'll briefly highlight user-facing API here. The terminology is the following. Algorithms provide guarantees, whereas users have requirements. Examples of the requirements API:
The requirements belong to requirement categories. For instance,
cub::run_to_run_determinism
shares category withcub::nondeterminism
. Only one requirement per category is allowed in the requirements list.The requirements API allows:
The requirements API doesn’t allow:
Users do not see passed this API. As far as users are concerned, we reserve the right to change default guarantees at any time. If there are functional requirements that are critical for users, those should be specified specifically.
All guarantees are ordered within the category. We reserve the right to select the implementation with stronger guarantees compared to what user requested. The remaining part of this section is only related to CUB developers.
Checklist