Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional requirements API #1326

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gevtushenko
Copy link
Collaborator

Description

closes #1187

This is a prerequisite PR for deterministic scan. This PR provides an implementation and a design alternative for functional requirements API. Some of the functional requirements to the design below:

Design Requirements

Generic

The requirements API should not be limited to determinism guarantees. There are use cases where clients couldn't use scan because it's not work efficient (invokes operator more times than needed). Alternatively, one could imagine a functional requirement on the maximal temporary storage size. The design of functional requirements API should allow alternative functional requirements going forward.

Algorithm Specific

There might be algorithm-specific functional guarantees. For instance, in the for-each one could relax the requirement of not being able to copy elements or alternatively, the user can explicitly require that the function object is invoked on the original value. Alternatively, we scan might provide a guarantee on storing partial aggregates.

Library-Wide Requirements

Opposite to the previous requirement, we should provide library-wide requirements. It won't be a great experience if every algorithm provide it's own version of determinism requirements. This can get inconsistent quickly. Apart from that, when we start working on determinism API exposure in Thrust, it'll be mush easier to attach "run-to-run deterministic" tag to execution policy as opposed to attaching such a tag to each algorithm.

Type Safe

If given functional requirement can't be satisfied, the most sound approach would be to fail at compile time. If we introduce the functional requirement API for some algorithm, but then forget to handle, say, determinism requirements, it'll be a critical issue on the users end. I incline towards emitting a compile-time error if we see function requirement that's not known to a given algorithm.

Visible Defaults

The default deterministic API "parameter" should be defined on the algorithm basis. Apart from different algorithms exposing different guarantees (scan vs reduce), this would provide some visibility into algorithm behavior without looking deep into the implementation.

Runtime

The API should allow passing runtime parameters. Say you have N free bytes in memory and you request a given algorithm (reduce by key) to fit into this memory. This is a functional requirement which requires runtime parameter.

Design Alternative

Details on the design are given in the developer overview section of CUB docs. I'll briefly highlight user-facing API here. The terminology is the following. Algorithms provide guarantees, whereas users have requirements. Examples of the requirements API:

// default guarantees (see below)
reduce::sum(begin, end);

// run-to-run determinism is enforced
reduce::sum(begin, end, cub::require(cub::run_to_run_determinism));

// default determinism guarantees, memory footprint limit
reduce::sum(begin, end, cub::require(cub::max_memory_footprint(128_KB)));

// default determinism guarantees, memory footprint limit
reduce::sum(begin, end, cub::require(cub::max_memory_footprint(128_KB)));

// run-to-run deterministic with memory footprint limit
reduce::sum(begin, end, cub::require(cub::run_to_run_determinism,
                                     cub::max_memory_footprint(128_KB)));

The requirements belong to requirement categories. For instance, cub::run_to_run_determinism shares category with cub::nondeterminism. Only one requirement per category is allowed in the requirements list.

The requirements API allows:

  • stateful requirements
  • providing requirements in any order
  • not instantiating unrelated kernels
  • failing at compile time if requirements can’t be satisfied

The requirements API doesn’t allow:

  • multiple requirements from the same category in the requirements list
  • providing requirements from the category that is not supported by the algorithm

Users do not see passed this API. As far as users are concerned, we reserve the right to change default guarantees at any time. If there are functional requirements that are critical for users, those should be specified specifically.

All guarantees are ordered within the category. We reserve the right to select the implementation with stronger guarantees compared to what user requested. The remaining part of this section is only related to CUB developers.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gevtushenko gevtushenko requested review from a team as code owners January 26, 2024 22:11
@gevtushenko gevtushenko changed the title Start working on requirements Functional requirements API Jan 26, 2024
cub/cub/detail/meta.cuh Show resolved Hide resolved
cub/cub/detail/meta.cuh Outdated Show resolved Hide resolved
cub/cub/detail/requirements.cuh Outdated Show resolved Hide resolved
cub/cub/detail/requirements.cuh Outdated Show resolved Hide resolved
cub/cub/detail/requirements.cuh Outdated Show resolved Hide resolved
cub/cub/detail/requirements.cuh Outdated Show resolved Hide resolved
cub/docs/developer_overview.rst Outdated Show resolved Hide resolved

template <class IteratorT, class ReguirementsT>
static typename ::cuda::std::enable_if<
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1)
cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, RequirementsT>::value, // (1)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: How does this approach interact with non-static requirements, e.g. memory footprint?

cub/test/catch2_test_guarantees.cu Outdated Show resolved Hide resolved
{
auto guarantees = cub::detail::requirements::mask(default_guarantees_t(), requirements); // (3)
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4)
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing something that came up in the meeting -- it would be nicer for readability to have a query that doesn't spell out a specific guarantee, but rather the category. Maybe something like

enum determinism_t
{
  category, // Always the weakest
  nondeterministic,
  run_to_run_deterministic
};

// ...

namespace category {

// Never exposed outside of detail
using determinism = determinism_holder_t<determinism_t::category>;

}

// ...

auto determinism = cub::detail::requirements::get<cub::detail::category::determinism>(guarantees);

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this. I like the direction but have concerns about some implementation details.

  1. I am concerned about binary size implications. Passing the as variadic arguments has the potential to blow up. Also it is quite easy to accidentally pass multiple requirements instead of one.
  2. I find the metaprogramming overly complex. We want to unsure that only duplicates are within the requirements list. The trick with operator< is overly complex for that
  3. In general this has a lot of similarity to the property based system we have prototyped for <memory_resource> .
  4. I believe we should have a way to query the requirement categories for every algorithm and those should be unique for each algorithm.

Putting my thoughts together I believe we should define a algorithm_requirement for each algorithm and let it handle all the requirements.

I have shortly hacked something along those lines here https://godbolt.org/z/47YcaGx7M
Note that this currently rejects duplicated entries, but this is something we could easily add.

cub/cub/detail/meta.cuh Show resolved Hide resolved
cub/cub/detail/meta.cuh Outdated Show resolved Hide resolved
{};

template <class T, class U>
struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type>
struct statically_ordered_t<T, U, ::cuda::std::__enable_if_t<true_t<T{} < U{} || U{} < T{}>::value>>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am a bit confused about the || here

Comment on lines 105 to 108
struct statically_equal_t
: ::cuda::std::integral_constant<
bool,
statically_ordered_t<T, U>::value && !statically_less_t<T, U>::value && !statically_less_t<U, T>::value>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not covering the case where T{} == U{} which seems important

cub/cub/detail/requirements.cuh Outdated Show resolved Hide resolved
detail::max_memory_footprint_t>;

template <class IteratorT, class ReguirementsT = default_guarantees_t>
static auto sum(IteratorT begin, IteratorT end, ReguirementsT requirements = default_guarantees_t()) // (2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: This interface is insufficiently constrained. We should ensure that we constrain the passed requirement to the right type. Otherwise this will give bas error messages to a user that passes something wrong

Comment on lines 845 to 846
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4)
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imporntant: We should align properly. Also we are inconsistent with the use of the namespaces

};

template <determinism_t L, determinism_t R>
_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: We should not use internal macros in public code sampes

Suggested change
_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)
__host__ __device__ constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)

auto guarantees = cub::detail::requirements::mask(default_guarantees_t(), requirements); // (3)
auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4)
auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)
return reduce::sum(begin, end, determinism, max_footprint); // (6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I am concerned about binary size here. What if a user calls this directly with switched guarantees?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Requirements doesn't go deep into CUB. The worst thing that can happen is instantiating a few thin dispatch functions.


class scan
{
using weak_guarantees_t = cub::detail::guarantees::guarantees_t<cub::detail::guarantees::determinism_not_guaranteed_t,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: I am missing a way to query the potential requirement categories of the algorithm.

@gevtushenko
Copy link
Collaborator Author

@pciolkosz suggested to extend the requirements API with guarantees API. The idea is that sometimes users require certain properties (already implemented in this PR), but sometimes, they provide guarantees. For instance, user might guarantee convergence of threads which would allow us to dispatch to more optimal implementation. We should consider this, since it's also useful in CUB. Apart from that, we should consider exposing API in cuda:: instead of cub::.

@jrhemstad
Copy link
Collaborator

For instance, user might guarantee convergence of threads which would allow us to dispatch to more optimal implementation.

After thinking about this a while, it occurs to me that guarantees are effectively the same as properties we developed for cuda::mr.

The way I currently think about it:

  • Requirements are a request from the user to the API, e.g., "Please give me run-to-run determinism"
  • Guarantees/Properties are a promise from to the user to the API, e.g., "I promise this resource allocates device accessible memory." or "I promise this thread group is converged."

@gevtushenko do you think the current properties machinery could be used for the guarantees (but not requirements)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Initial investigation into API design for user-specified determinism guarantees
4 participants