-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop more of thrust type traits #1721
Conversation
47e03ff
to
c2791c6
Compare
🟨 CI Results [ Failed: 58 | Passed: 140 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
983223a
to
78f53a2
Compare
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
|
||
template <typename T> | ||
struct is_non_bool_integral : public is_integral<T> | ||
struct is_non_bool_integral : public ::cuda::std::is_integral<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The public
is not necessary.
struct is_non_bool_integral : public ::cuda::std::is_integral<T> | |
struct is_non_bool_integral : ::cuda::std::is_integral<T> |
One more occurrence below.
a548308
to
7fb67d2
Compare
🟨 CI Results: Pass: 98%/198 | Total Time: 4d 04h | Avg Time: 30m 31s | Hits: 37%/130542
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
7fb67d2
to
8fbd1fb
Compare
e69f93e
to
2cd78ee
Compare
🟨 CI Results: Pass: 99%/302 | Total Time: 5d 19h | Avg Time: 27m 41s | Hits: 25%/383279
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
|
||
#if _CCCL_STD_VER >= 2014 && !defined(_LIBCUDACXX_HAS_NO_VARIABLE_TEMPLATES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: How can we be on C++14, but have no variable templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is old gcc compilers
🟨 CI Results: Pass: 99%/302 | Total Time: 3d 08h | Avg Time: 16m 03s | Hits: 66%/383279
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco this is a remarkable work!
struct is_integral<const long long> : public true_type | ||
{}; | ||
template <> | ||
struct is_integral<const unsigned long long> : public true_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: is_integral didn't use to treat __int128_t
as integral, but libcu++ version does. As a result, promoted type now supports __int128_t
. This leads to thrust::complex<float>
operations being available on __int128_t
types: thrust::complex<float>{4.0f, 2.0f} += __int128_t{1};
. New subset of use cases doesn't lead to compile-time error while not being covered with tests. I doubt that it's critical, but maybe worth filing an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prase: __int128_t
is now recognized as arithmetic type, which allows thrust to use radix sort instead of merge sort. This gives us ~30% speedup, great job!
remark: extrema now attempts load vectorization on __int128_t
. This doesn't seem to deteriorate performance, so we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously MSVC2017 did break because of that. I worked around this by also replacing thrust::promoted_numerical_type
{}; | ||
|
||
template <typename T> | ||
struct has_trivial_destructor : public is_pod<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: this is incredible. We used to call an extra kernel destructing all the elemenets of a device vector for most of the types that thrust traits haven't recognized as pod! With this PR, we do not destruct objects unless there's a non-trivial destructor. This essentially means we save a kernel call per each device vector, leading to 30% reduction in overhead of using device vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental performance improvements are the best improvements
7f1ee12
to
4913090
Compare
/ok to test |
4913090
to
33c4850
Compare
🟩 CI Results: Pass: 100%/361 | Total Time: 6d 05h | Avg Time: 24m 47s | Hits: 46%/522045
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
🏃 Runner counts (total jobs: 361)
# | Runner |
---|---|
264 | linux-amd64-cpu16 |
52 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
21 | windows-amd64-cpu16 |
The week ends early so here the obligatory "Michael goes rampage about minor annoyance"
err maintenance PR
Note: everything should be in
thrust::detail
, so it is fair game