Implement `<cuda/std/bitset>` #1496

griwes · 2024-03-06T08:02:42Z

Description

Implements <cuda/std/bitset>, with a complementary backport of constexprness to C++14 and up.

Resolves #1321

Note for reviewers: I separated pulling updated upstream headers into their own commits, so you can more or less ignore the first two commits of this PR while reviewing.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

libcudacxx/include/cuda/std/detail/libcxx/include/__algorithm/copy.h

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

libcudacxx/libcxx/test/std/utilities/template.bitset/bitset.cons/char_ptr_ctor.pass.cpp

libcudacxx/include/cuda/std/detail/libcxx/include/bitset

Math is hard, even for compilers (especially for older compilers).

I admit I am a bit lost about the apparent differences between whether this operation is inside the if or outside of it between __copy_aligned and __copy_backward_aligned, but it seems that it works when they aren't symmetric.

Seems that I had the code right from the beginning, but it's simply a codegen-ish snafu with older GCCs, specifically in the backward case? I am confused, but maybe this time it'll fix more than it breaks.

…agma.

GCC is apparently annoyed and warns that this function always calls a non-constexpr function...

miscco

Second pass review

miscco · 2024-04-24T10:58:14Z

libcudacxx/include/cuda/std/__cccl/diagnostic.h

@@ -32,34 +32,43 @@
 #  define _CCCL_DIAG_SUPPRESS_GCC(str)
 #  define _CCCL_DIAG_SUPPRESS_NVHPC(str)
 #  define _CCCL_DIAG_SUPPRESS_MSVC(str)
+#  define _CCCL_DIAG_SUPPRESS_ICC(str)
 #elif defined(_CCCL_COMPILER_GCC) || defined(_CCCL_COMPILER_ICC)


Please move ICC to its own section

miscco · 2024-04-24T10:59:54Z

libcudacxx/include/cuda/std/detail/libcxx/include/__string

+#include <cuda/std/detail/libcxx/include/memory>     // for __murmur2_or_cityhash
+
+#include <cuda/std/detail/libcxx/include/algorithm>  // for search and min


Important: Please include the individual headers

miscco · 2024-04-24T11:01:23Z

libcudacxx/include/cuda/std/detail/libcxx/include/__string

+template <class _CharT>
+inline


Question: I am wondering is the inline here needed given that it is already a template?

Also why is none of those functions marked as _LIBCUDACXX_INLINE_VISIBILITY?

If they are currently not used I would strongly prefer to just drop them

libcudacxx/include/cuda/std/detail/libcxx/include/__string

miscco · 2024-04-24T11:03:11Z

libcudacxx/include/cuda/std/detail/libcxx/include/__string

+                ++__len;
+            return __len;
+        }
+#endif


Important: Please add comments to the #endif Applies throughout

miscco · 2024-04-24T11:27:07Z

libcudacxx/include/cuda/std/__algorithm/copy.h

+template <class _Tp, class _Up>
+inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 bool
+__constexpr_tail_overlap_fallback(_Tp* __first, _Up* __needle, _Tp* __last)
+{
+  while (__first != __last)
+  {
+    if (__first == __needle)
+    {
+      return true;
+    }
+    ++__first;
+  }
+  return false;
+}
+
+template <class _Tp, class _Up>
+inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 bool
+__constexpr_tail_overlap(_Tp* __first, _Up* __needle, _Tp* __last)
+{
+#if __has_builtin(__builtin_constant_p) || defined(_CCCL_COMPILER_GCC)
+  NV_IF_ELSE_TARGET(NV_IS_HOST,
+                    (return __builtin_constant_p(__first < __needle) && __first < __needle;),
+                    (return __constexpr_tail_overlap_fallback(__first, __needle, __last);))
+#else
+  return __constexpr_tail_overlap_fallback(__first, __needle, __last);
+#endif
+}
+


I am not a fan of those changes. I would much rather prefer if we would write an appropriate function for bitset rather than change the copy implementation

Eh. We already purposefully go to memmove and not memcpy in this implementation at runtime. I don't understand why we wouldn't just extend that and provide a bit of an extra guarantee that is explicitly allowed by the standard.

@wmaxey and maybe @jrhemstad: any hot takes on this issue?

I don't necessarily have an opinion on this specific change. However seeing memcpy/memmove mentioned adds to the concern that those functions have particularly bad performance on device.

It looks like the same hacky constexpr trickery I had to do in <cuda/std/bit> which makes me feel okay with what is here.

miscco · 2024-04-24T11:29:29Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference


 template <class _Cp>
-inline _LIBCUDACXX_INLINE_VISIBILITY
-void
+inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 void


Question: Do we want to take the opportunity and moderinze the code base?

I was previously of the opinion that we should stay close to libcxx but I am less and less convinced

miscco · 2024-04-24T11:31:22Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

+    {
+      __result.__seg_ -= __nw;
+      __last.__seg_ -= __nw;
+      _CUDA_VSTD::copy_n(std::__to_address(__last.__seg_), __nw, std::__to_address(__result.__seg_));


Critical: We should not use std:: qualified functions.

Ah yes, that's just me failing to add /g at the end of my sed here. Will fix.

miscco · 2024-04-24T11:32:28Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

-                                                  static_cast<unsigned>(__size_ % __bits_per_word));
+      for (size_t __i = 0; __i != __bit_array<_Cp>::_Np; ++__i)
+      {
+        _CUDA_VSTD::__construct_at(__word_ + __i, 0);


Question: AFAIK the storage_type should be trivial. Can we just assign to it?

I can try to check this, but I am expecting to run into problems with compilers that don't implement the implicit lifetime types DR (which would be most of the compilers in our matrix). (Edit: especially at constexpr.)

miscco · 2024-04-24T11:32:53Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

+      {
+        __bit_array<_Cp> __b(__d1);
+        _CUDA_VSTD::copy(__first, __middle, __b.begin());
+        _CUDA_VSTD::copy(__b.begin(), __b.end(), std::copy(__middle, __last, __first));


Ditto should not use std:: functions

miscco · 2024-05-06T11:52:09Z

libcudacxx/include/cuda/std/bitset

+#ifndef _CUDA_STD_BITSET
+#define _CUDA_STD_BITSET
+
+#include "detail/__config"


This will need to have clang-format off to ensure that formatting does not screw up the ordering of the includes

miscco · 2024-05-06T11:52:47Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

+public:
+  using __container = typename _Cp::__self;
+
+  _LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX14 __bit_reference(const __bit_reference&) = default;


Should we do that

miscco · 2024-05-06T11:54:00Z

libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference

+  }
+  // do middle whole words
+  __storage_type __nw = __n / __bits_per_word;
+  _CUDA_VSTD::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0);


This still has std:: in it

fbusato · 2024-05-24T19:36:45Z

Please consider to use uint32_t for the storage type if it is allowed by the C++ specification https://github.com/NVIDIA/cccl/blob/main/libcudacxx/include/cuda/std/detail/libcxx/include/bitset#L151.
64-bit operations are less efficient on gpu architectures

miscco reviewed Mar 6, 2024

View reviewed changes

griwes added 9 commits April 1, 2024 13:04

Pull latest bitset header and test from upstream.

0ce08eb

Also import a new version of __bit_reference.

58f4f4c

Port bitset and bitset tests.

667e977

Enable and test constexpr bitset in C++14+.

dd92536

Remove missed unnecessary includes.

899f02a

Address review comments.

1a08ccc

Inline member function defs in bitset; reformat.

b1c9870

Un-mess-up pragma push/pops.

b358480

Unbreak constexpr algorithms.

4083a7f

griwes force-pushed the feature/bitset branch from a69dc88 to 4083a7f Compare April 2, 2024 02:09

griwes added 19 commits April 2, 2024 22:35

Fix CI failures.

f85d5ee

More CI fixes.

5df220c

More CI fixes.

f32d75f

Merge remote-tracking branch 'origin/main' into feature/bitset

702a7d0

Style, formatting, CI fix.

7729d86

More CI fixes.

7b30d77

Merge remote-tracking branch 'origin/main' into feature/bitset

f208ba4

Fix a couple of snafus.

2550eb0

Merge remote-tracking branch 'origin/main' into feature/bitset

94be07a

Fix (?) GCC pre-9.

674993f

Math is hard, even for compilers (especially for older compilers).

Undo the changes to __copy_aligned.

e8b3151

I admit I am a bit lost about the apparent differences between whether this operation is inside the if or outside of it between __copy_aligned and __copy_backward_aligned, but it seems that it works when they aren't symmetric.

...also redo changes in __copy_backward_aligned.

9bd7873

Seems that I had the code right from the beginning, but it's simply a codegen-ish snafu with older GCCs, specifically in the backward case? I am confused, but maybe this time it'll fix more than it breaks.

With a bit of luck, the layers of workarounds are correct now.

34cdd99

Attempt to satiate the MSVC SFINAE gods.

259508d

Silence a useless warning.

125c526

Fix the GCC 9 constexpr evaluation workaround.

ec41622

MSVC warning fixes.

a0d306c

ICC warning fixes.

f3ea447

Another MSVC warning fix.

3371618

griwes added 7 commits April 22, 2024 17:40

Attempt to deal with CICC 11.1 ICE w/ MSVC.

5fc23db

Fix several MSVC issues, another attempt at the ICE.

c46cb6b

Fix accidentally broken friendness.

1536494

Undo the attempt to fix the ICE, it broke more.

ed223f8

Add a layer of indirection to maybe remove the need for the ICEing pr…

4c9eb5f

…agma.

Forgot the inline to silence a silly warning.

098d646

Unmark the always-throwing specialization of the helper 'constexpr'.

7e6cba8

GCC is apparently annoyed and warns that this function always calls a non-constexpr function...

griwes marked this pull request as ready for review April 23, 2024 08:57

griwes requested review from a team as code owners April 23, 2024 08:57

griwes requested review from wmaxey and elstehle April 23, 2024 08:57

miscco reviewed Apr 24, 2024

View reviewed changes

miscco reviewed May 6, 2024

View reviewed changes

miscco added 2.5.0 feature request New feature or request. libcu++ For all items related to libcu++ labels May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `<cuda/std/bitset>` #1496

Implement `<cuda/std/bitset>` #1496

griwes commented Mar 6, 2024 •

edited

miscco left a comment

miscco Apr 24, 2024

miscco Apr 24, 2024

miscco Apr 24, 2024

miscco Apr 24, 2024

miscco Apr 24, 2024

griwes Apr 24, 2024

griwes Apr 24, 2024

wmaxey Apr 30, 2024

wmaxey Apr 30, 2024

miscco Apr 30, 2024

miscco Apr 24, 2024

miscco Apr 24, 2024

griwes Apr 24, 2024

miscco Apr 24, 2024

griwes Apr 24, 2024 •

edited

miscco Apr 24, 2024

miscco May 6, 2024

miscco May 6, 2024

miscco May 6, 2024

fbusato commented May 24, 2024

		#include <cuda/std/detail/libcxx/include/memory> // for __murmur2_or_cityhash

		#include <cuda/std/detail/libcxx/include/algorithm> // for search and min

Implement <cuda/std/bitset> #1496

Are you sure you want to change the base?

Implement <cuda/std/bitset> #1496

Conversation

griwes commented Mar 6, 2024 • edited

Description

Checklist

miscco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

griwes Apr 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbusato commented May 24, 2024

Implement `<cuda/std/bitset>` #1496

Implement `<cuda/std/bitset>` #1496

griwes commented Mar 6, 2024 •

edited

griwes Apr 24, 2024 •

edited