Functional requirements API #1326

gevtushenko · 2024-01-26T22:11:50Z

Description

This is a prerequisite PR for deterministic scan. This PR provides an implementation and a design alternative for functional requirements API. Some of the functional requirements to the design below:

Design Requirements

Generic

The requirements API should not be limited to determinism guarantees. There are use cases where clients couldn't use scan because it's not work efficient (invokes operator more times than needed). Alternatively, one could imagine a functional requirement on the maximal temporary storage size. The design of functional requirements API should allow alternative functional requirements going forward.

Algorithm Specific

There might be algorithm-specific functional guarantees. For instance, in the for-each one could relax the requirement of not being able to copy elements or alternatively, the user can explicitly require that the function object is invoked on the original value. Alternatively, we scan might provide a guarantee on storing partial aggregates.

Library-Wide Requirements

Opposite to the previous requirement, we should provide library-wide requirements. It won't be a great experience if every algorithm provide it's own version of determinism requirements. This can get inconsistent quickly. Apart from that, when we start working on determinism API exposure in Thrust, it'll be mush easier to attach "run-to-run deterministic" tag to execution policy as opposed to attaching such a tag to each algorithm.

Type Safe

If given functional requirement can't be satisfied, the most sound approach would be to fail at compile time. If we introduce the functional requirement API for some algorithm, but then forget to handle, say, determinism requirements, it'll be a critical issue on the users end. I incline towards emitting a compile-time error if we see function requirement that's not known to a given algorithm.

Visible Defaults

The default deterministic API "parameter" should be defined on the algorithm basis. Apart from different algorithms exposing different guarantees (scan vs reduce), this would provide some visibility into algorithm behavior without looking deep into the implementation.

Runtime

The API should allow passing runtime parameters. Say you have N free bytes in memory and you request a given algorithm (reduce by key) to fit into this memory. This is a functional requirement which requires runtime parameter.

Design Alternative

Details on the design are given in the developer overview section of CUB docs. I'll briefly highlight user-facing API here. The terminology is the following. Algorithms provide guarantees, whereas users have requirements. Examples of the requirements API:

// default guarantees (see below)
reduce::sum(begin, end);

// run-to-run determinism is enforced
reduce::sum(begin, end, cub::require(cub::run_to_run_determinism));

// default determinism guarantees, memory footprint limit
reduce::sum(begin, end, cub::require(cub::max_memory_footprint(128_KB)));

// default determinism guarantees, memory footprint limit
reduce::sum(begin, end, cub::require(cub::max_memory_footprint(128_KB)));

// run-to-run deterministic with memory footprint limit
reduce::sum(begin, end, cub::require(cub::run_to_run_determinism,
                                     cub::max_memory_footprint(128_KB)));

The requirements belong to requirement categories. For instance, cub::run_to_run_determinism shares category with cub::nondeterminism. Only one requirement per category is allowed in the requirements list.

The requirements API allows:

stateful requirements
providing requirements in any order
not instantiating unrelated kernels
failing at compile time if requirements can’t be satisfied

The requirements API doesn’t allow:

multiple requirements from the same category in the requirements list
providing requirements from the category that is not supported by the algorithm

Users do not see passed this API. As far as users are concerned, we reserve the right to change default guarantees at any time. If there are functional requirements that are critical for users, those should be specified specifically.

All guarantees are ordered within the category. We reserve the right to select the implementation with stronger guarantees compared to what user requested. The remaining part of this section is only related to CUB developers.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

cub/cub/detail/meta.cuh

cub/cub/detail/requirements.cuh

cub/docs/developer_overview.rst

alliepiper · 2024-01-29T20:08:46Z

cub/docs/developer_overview.rst

+
+      template <class IteratorT, class ReguirementsT>
+      static typename ::cuda::std::enable_if<
+        cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value,             // (1)


Typo:

Suggested change

cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1)

cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, RequirementsT>::value, // (1)

Question: How does this approach interact with non-static requirements, e.g. memory footprint?

cub/test/catch2_test_guarantees.cu

alliepiper · 2024-01-29T20:30:32Z

cub/docs/developer_overview.rst

+      {
+        auto guarantees    = cub::detail::requirements::mask(default_guarantees_t(), requirements);                         // (3)
+        auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees);                    // (4)
+        auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)


Capturing something that came up in the meeting -- it would be nicer for readability to have a query that doesn't spell out a specific guarantee, but rather the category. Maybe something like

enum determinism_t { category, // Always the weakest nondeterministic, run_to_run_deterministic }; // ... namespace category { // Never exposed outside of detail using determinism = determinism_holder_t<determinism_t::category>; } // ... auto determinism = cub::detail::requirements::get<cub::detail::category::determinism>(guarantees);

miscco

Thanks a lot for working on this. I like the direction but have concerns about some implementation details.

I am concerned about binary size implications. Passing the as variadic arguments has the potential to blow up. Also it is quite easy to accidentally pass multiple requirements instead of one.
I find the metaprogramming overly complex. We want to unsure that only duplicates are within the requirements list. The trick with operator< is overly complex for that
In general this has a lot of similarity to the property based system we have prototyped for <memory_resource> .
I believe we should have a way to query the requirement categories for every algorithm and those should be unique for each algorithm.

Putting my thoughts together I believe we should define a algorithm_requirement for each algorithm and let it handle all the requirements.

I have shortly hacked something along those lines here https://godbolt.org/z/47YcaGx7M
Note that this currently rejects duplicated entries, but this is something we could easily add.

cub/cub/detail/meta.cuh

miscco · 2024-01-30T08:36:00Z

cub/cub/detail/meta.cuh

+{};
+
+template <class T, class U>
+struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type>


Suggested change

struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} || U{} < T{}>::value>::type>

struct statically_ordered_t<T, U, ::cuda::std::__enable_if_t<true_t<T{} < U{} || U{} < T{}>::value>>

Also I am a bit confused about the || here

miscco · 2024-01-30T08:38:56Z

cub/cub/detail/meta.cuh

+struct statically_equal_t
+    : ::cuda::std::integral_constant<
+        bool,
+        statically_ordered_t<T, U>::value && !statically_less_t<T, U>::value && !statically_less_t<U, T>::value>


This is not covering the case where T{} == U{} which seems important

cub/cub/detail/requirements.cuh

miscco · 2024-01-30T09:08:19Z

cub/docs/developer_overview.rst

+                                              detail::max_memory_footprint_t>;
+
+      template <class IteratorT, class ReguirementsT = default_guarantees_t>
+      static auto sum(IteratorT begin, IteratorT end, ReguirementsT requirements = default_guarantees_t())                  // (2)


Important: This interface is insufficiently constrained. We should ensure that we constrain the passed requirement to the right type. Otherwise this will give bas error messages to a user that passes something wrong

miscco · 2024-01-30T09:10:02Z

cub/docs/developer_overview.rst

+        auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees);                    // (4)
+        auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)


Imporntant: We should align properly. Also we are inconsistent with the use of the namespaces

miscco · 2024-01-30T09:13:56Z

cub/docs/developer_overview.rst

+   };
+
+   template <determinism_t L, determinism_t R>
+   _CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)


Important: We should not use internal macros in public code sampes

Suggested change

_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)

__host__ __device__ constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)

miscco · 2024-01-30T09:22:31Z

cub/docs/developer_overview.rst

+        auto guarantees    = cub::detail::requirements::mask(default_guarantees_t(), requirements);                         // (3)
+        auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees);                    // (4)
+        auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)
+        return reduce::sum(begin, end, determinism, max_footprint);                                                         // (6)


Question: I am concerned about binary size here. What if a user calls this directly with switched guarantees?

That's fine. Requirements doesn't go deep into CUB. The worst thing that can happen is instantiating a few thin dispatch functions.

miscco · 2024-01-30T09:31:38Z

cub/docs/developer_overview.rst

+
+    class scan
+    {
+      using weak_guarantees_t = cub::detail::guarantees::guarantees_t<cub::detail::guarantees::determinism_not_guaranteed_t,


Important: I am missing a way to query the potential requirement categories of the algorithm.

gevtushenko · 2024-02-03T04:18:52Z

@pciolkosz suggested to extend the requirements API with guarantees API. The idea is that sometimes users require certain properties (already implemented in this PR), but sometimes, they provide guarantees. For instance, user might guarantee convergence of threads which would allow us to dispatch to more optimal implementation. We should consider this, since it's also useful in CUB. Apart from that, we should consider exposing API in cuda:: instead of cub::.

jrhemstad · 2024-02-06T16:13:13Z

For instance, user might guarantee convergence of threads which would allow us to dispatch to more optimal implementation.

After thinking about this a while, it occurs to me that guarantees are effectively the same as properties we developed for cuda::mr.

The way I currently think about it:

Requirements are a request from the user to the API, e.g., "Please give me run-to-run determinism"
Guarantees/Properties are a promise from to the user to the API, e.g., "I promise this resource allocates device accessible memory." or "I promise this thread group is converged."

@gevtushenko do you think the current properties machinery could be used for the guarantees (but not requirements)?

Start working on requirements

519e4c3

gevtushenko requested review from a team as code owners January 26, 2024 22:11

gevtushenko requested review from elstehle, alliepiper, jrhemstad and ericniebler January 26, 2024 22:11

gevtushenko mentioned this pull request Jan 26, 2024

[FEA]: Run-to-run deterministic scan #1327

Open

1 task

gevtushenko changed the title ~~Start working on requirements~~ Functional requirements API Jan 26, 2024

Unused variables

e2af59e

alliepiper reviewed Jan 29, 2024

View reviewed changes

miscco requested changes Jan 30, 2024

View reviewed changes

gevtushenko added 5 commits January 30, 2024 17:50

Remove extra include

2bae8bb

Rename concepts

a70afe5

Better error messages

a790528

Typo

5f80af5

Better name

c4b15c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functional requirements API #1326

Functional requirements API #1326

gevtushenko commented Jan 26, 2024

alliepiper Jan 29, 2024

alliepiper Jan 29, 2024

alliepiper Jan 29, 2024

miscco left a comment

miscco Jan 30, 2024

miscco Jan 30, 2024

miscco Jan 30, 2024

miscco Jan 30, 2024

miscco Jan 30, 2024

miscco Jan 30, 2024

miscco Jan 30, 2024

gevtushenko Jan 30, 2024

miscco Jan 30, 2024

gevtushenko commented Feb 3, 2024

jrhemstad commented Feb 6, 2024

	cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, ReguirementsT>::value, // (1)
	cub::detail::guarantees::statically_satisfy_t<weak_guarantees_t, RequirementsT>::value, // (1)

	struct statically_ordered_t<T, U, typename ::cuda::std::enable_if<true_t<T{} < U{} \|\| U{} < T{}>::value>::type>
	struct statically_ordered_t<T, U, ::cuda::std::__enable_if_t<true_t<T{} < U{} \|\| U{} < T{}>::value>>

		auto max_footprint = cub::detail::requirements::get<detail::max_memory_footprint_t>(guarantees); // (4)
		auto determinism = cub::detail::requirements::get<cub::detail::guarantees::run_to_run_deterministic_t>(guarantees); // (5)

	_CCCL_HOST_DEVICE constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)
	__host__ __device__ constexpr bool operator<(determinism_holder_t<L>, determinism_holder_t<R>)

Functional requirements API #1326

Are you sure you want to change the base?

Functional requirements API #1326

Conversation

gevtushenko commented Jan 26, 2024

Description

Design Requirements

Generic

Algorithm Specific

Library-Wide Requirements

Type Safe

Visible Defaults

Runtime

Design Alternative

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miscco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gevtushenko commented Feb 3, 2024

jrhemstad commented Feb 6, 2024