-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement <cuda/std/bitset>
#1496
base: main
Are you sure you want to change the base?
Conversation
libcudacxx/include/cuda/std/detail/libcxx/include/__algorithm/copy.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__bit_reference
Outdated
Show resolved
Hide resolved
libcudacxx/libcxx/test/std/utilities/template.bitset/bitset.cons/char_ptr_ctor.pass.cpp
Outdated
Show resolved
Hide resolved
Math is hard, even for compilers (especially for older compilers).
I admit I am a bit lost about the apparent differences between whether this operation is inside the if or outside of it between __copy_aligned and __copy_backward_aligned, but it seems that it works when they aren't symmetric.
Seems that I had the code right from the beginning, but it's simply a codegen-ish snafu with older GCCs, specifically in the backward case? I am confused, but maybe this time it'll fix more than it breaks.
GCC is apparently annoyed and warns that this function always calls a non-constexpr function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass review
@@ -32,34 +32,43 @@ | |||
# define _CCCL_DIAG_SUPPRESS_GCC(str) | |||
# define _CCCL_DIAG_SUPPRESS_NVHPC(str) | |||
# define _CCCL_DIAG_SUPPRESS_MSVC(str) | |||
# define _CCCL_DIAG_SUPPRESS_ICC(str) | |||
#elif defined(_CCCL_COMPILER_GCC) || defined(_CCCL_COMPILER_ICC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move ICC to its own section
#include <cuda/std/detail/libcxx/include/memory> // for __murmur2_or_cityhash | ||
|
||
#include <cuda/std/detail/libcxx/include/algorithm> // for search and min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: Please include the individual headers
template <class _CharT> | ||
inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I am wondering is the inline
here needed given that it is already a template?
Also why is none of those functions marked as _LIBCUDACXX_INLINE_VISIBILITY
?
If they are currently not used I would strongly prefer to just drop them
++__len; | ||
return __len; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: Please add comments to the #endif
Applies throughout
template <class _Tp, class _Up> | ||
inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 bool | ||
__constexpr_tail_overlap_fallback(_Tp* __first, _Up* __needle, _Tp* __last) | ||
{ | ||
while (__first != __last) | ||
{ | ||
if (__first == __needle) | ||
{ | ||
return true; | ||
} | ||
++__first; | ||
} | ||
return false; | ||
} | ||
|
||
template <class _Tp, class _Up> | ||
inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 bool | ||
__constexpr_tail_overlap(_Tp* __first, _Up* __needle, _Tp* __last) | ||
{ | ||
#if __has_builtin(__builtin_constant_p) || defined(_CCCL_COMPILER_GCC) | ||
NV_IF_ELSE_TARGET(NV_IS_HOST, | ||
(return __builtin_constant_p(__first < __needle) && __first < __needle;), | ||
(return __constexpr_tail_overlap_fallback(__first, __needle, __last);)) | ||
#else | ||
return __constexpr_tail_overlap_fallback(__first, __needle, __last); | ||
#endif | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of those changes. I would much rather prefer if we would write an appropriate function for bitset rather than change the copy implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh. We already purposefully go to memmove
and not memcpy
in this implementation at runtime. I don't understand why we wouldn't just extend that and provide a bit of an extra guarantee that is explicitly allowed by the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wmaxey and maybe @jrhemstad: any hot takes on this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily have an opinion on this specific change. However seeing memcpy
/memmove
mentioned adds to the concern that those functions have particularly bad performance on device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the same hacky constexpr trickery I had to do in <cuda/std/bit>
which makes me feel okay with what is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me
|
||
template <class _Cp> | ||
inline _LIBCUDACXX_INLINE_VISIBILITY | ||
void | ||
inline _LIBCUDACXX_HIDE_FROM_ABI _LIBCUDACXX_INLINE_VISIBILITY _CCCL_CONSTEXPR_CXX14 void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we want to take the opportunity and moderinze the code base?
I was previously of the opinion that we should stay close to libcxx but I am less and less convinced
{ | ||
__result.__seg_ -= __nw; | ||
__last.__seg_ -= __nw; | ||
_CUDA_VSTD::copy_n(std::__to_address(__last.__seg_), __nw, std::__to_address(__result.__seg_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: We should not use std::
qualified functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that's just me failing to add /g
at the end of my sed here. Will fix.
static_cast<unsigned>(__size_ % __bits_per_word)); | ||
for (size_t __i = 0; __i != __bit_array<_Cp>::_Np; ++__i) | ||
{ | ||
_CUDA_VSTD::__construct_at(__word_ + __i, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: AFAIK the storage_type should be trivial. Can we just assign to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to check this, but I am expecting to run into problems with compilers that don't implement the implicit lifetime types DR (which would be most of the compilers in our matrix). (Edit: especially at constexpr
.)
{ | ||
__bit_array<_Cp> __b(__d1); | ||
_CUDA_VSTD::copy(__first, __middle, __b.begin()); | ||
_CUDA_VSTD::copy(__b.begin(), __b.end(), std::copy(__middle, __last, __first)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto should not use std::
functions
#ifndef _CUDA_STD_BITSET | ||
#define _CUDA_STD_BITSET | ||
|
||
#include "detail/__config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to have clang-format off
to ensure that formatting does not screw up the ordering of the includes
public: | ||
using __container = typename _Cp::__self; | ||
|
||
_LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX14 __bit_reference(const __bit_reference&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do that
} | ||
// do middle whole words | ||
__storage_type __nw = __n / __bits_per_word; | ||
_CUDA_VSTD::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has std::
in it
Please consider to use |
Description
Implements
<cuda/std/bitset>
, with a complementary backport of constexprness to C++14 and up.Resolves #1321
Note for reviewers: I separated pulling updated upstream headers into their own commits, so you can more or less ignore the first two commits of this PR while reviewing.
Checklist