Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable including atomics and friends in TUs that do not support them. #1736

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wmaxey
Copy link
Member

@wmaxey wmaxey commented May 13, 2024

Description

This removes most #error protections from atomics headers. This allows them to be included in TUs where they previously would force a failure if the __CUDA_ARCH__ list included an architecture that was not supported.

Link time failures are emitted when compiling for architectures where the API is not supported.

For example:

#include <cuda/std/atomic>

__global__ void test(cuda::std::atomic<int>* atom) {
    atom->load();
}

results in:

PS C:\cccl> nvcc test.cu -Ilibcudacxx/include -arch=sm_52
ptxas fatal   : Unresolved extern function '__atomic_is_not_supported_pre_sm_60'

closes: #1083

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wmaxey wmaxey requested review from a team as code owners May 13, 2024 22:28
Copy link
Contributor

🟩 CI Results [ Failed: 0 | Passed: 302 | Total: 302 ]
  • 🟩 Project cub [ Failed: 0 | Passed: 99 | Total: 99 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    
  • 🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    
  • 🟩 Project libcudacxx [ Failed: 0 | Passed: 104 | Total: 104 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 96  -- Total: 96 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 86  -- Total: 86 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 84  -- Total: 84 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 102 -- Total: 102
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 21  -- Total: 21 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 52  -- Total: 52 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 104 -- Total: 104
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 nvrtc (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
      🟩 verify_codegen (0% Fail)     Failed:  0  -- Passed:  1  -- Total:  1 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 49  -- Total: 49 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 27  -- Total: 27 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 30  -- Total: 30 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 17  -- Total: 17 
    

🏃‍ Runner counts (total jobs: 302)

# Runner
232 linux-amd64-cpu16
28 linux-amd64-gpu-v100-latest-1
24 linux-arm64-cpu16
18 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, that has been a long desired feature.

That said, we are missing

  • Enabling all tests that are currently guarded by pre-sm-70
  • Adding a test set that ensures that we do fail when passed a lower architecture

(return __atomic_load_n_cuda(__a->get(), static_cast<__memory_order_underlying_t>(__order), _Sco{});),
NV_IS_DEVICE,
(__atomic_is_not_supported_pre_sm_60();),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to make this a template that returns __atomic_underlying_t<_Sto> because otherwise we might have no return in that function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or:

Suggested change
(__atomic_is_not_supported_pre_sm_60();),
(__atomic_is_not_supported_pre_sm_60(); return {};),

@jrhemstad
Copy link
Collaborator

I agree with @miscco that I don't think this is functionality we can advertise until we have a way to test it. I believe properly testing this requires us to resolve #1084

@miscco
Copy link
Collaborator

miscco commented May 14, 2024

I agree with @miscco that I don't think this is functionality we can advertise until we have a way to test it. I believe properly testing this requires us to resolve #1084

I dont think so. With us moving towards linker errors we should be able to use NV_IF_TARGET to test this within the current framework

@griwes
Copy link
Collaborator

griwes commented May 14, 2024

Agreed with @miscco's review.

@jrhemstad
Copy link
Collaborator

use NV_IF_TARGET to test this within the current framework

Sure, you'd just need to rewrite all of the current atomic tests that use pre-sm-70 to instead use NV_IF_TARGET.

@miscco
Copy link
Collaborator

miscco commented May 15, 2024

use NV_IF_TARGET to test this within the current framework

Sure, you'd just need to rewrite all of the current atomic tests that use pre-sm-70 to instead use NV_IF_TARGET.

It depends a bit on the tests, but we have moved towards defining a single test function anyway, which makes this rather easy to do

@@ -64,11 +64,17 @@ struct __atomic_storage
}
};

#if defined(_CCCL_CUDA_COMPILER)
extern "C" _CCCL_DEVICE void __atomic_is_not_supported_pre_sm_60();
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#endif // _CCCL_CUDA_COMPILER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[FEA]: Improve usability of architecture specific features in libcudacxx
5 participants