Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Improve usability of architecture specific features in libcudacxx #1083

Open
3 tasks
jrhemstad opened this issue Nov 10, 2023 · 1 comment · May be fixed by #1736
Open
3 tasks

[FEA]: Improve usability of architecture specific features in libcudacxx #1083

jrhemstad opened this issue Nov 10, 2023 · 1 comment · May be fixed by #1736
Labels
feature request New feature or request.

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Nov 10, 2023

Is this a duplicate?

Area

libcu++

Is your feature request related to a problem? Please describe.

As a CUDA developer using libcu++, I want to be able to use architecture dependent features of libcudacxx in my CUDA application. For any given libcudacxx header and feature, I need to be able to do the following:

#include <cuda/header>

__global__ void kernel(...){
  NV_DISPATCH_TARGET(
    NV_IS_EXACTLY_SM_60,   ( do_sm60_thing(); ),
    NV_PROVIDES_SM_70, ( do_sm70_thing(); ),
    NV_PROVIDES_SM_90, ( do_sm90_thing(); )
  )
}

I need to be able to compile this file with any set of architectures (-gencode arch=compute_XX,code=sm_XX) and for it to be able to compile and link successfully so long as I am always careful to use an architecture dependent feature in an appropriately guarded code path, whether using NV_IF_TARGET or __CUDA_ARCH__.

However, this does not work universally today. For example, the following fails to compile when compiled with -gencode arch=compute_52,code=sm_52 -gencode arch=compute_70,code=sm_70

#include <cuda/atomic>
#include <nv/target>

__global__ void kernel(){
  NV_IF_TARGET(
    NV_PROVIDES_SM_70,
      cuda::atomic<int> i;
  )
}

https://godbolt.org/z/ddMaW65Ej

This is because the cuda/atomic header will unconditionally error any time it is included in a TU that compiles for an architecture less than sm60, even if the feature is never used in code paths for the unsupported architecture.

A similar problem exists with cuda/barrier: https://godbolt.org/z/aEjsMT5YK

Describe the solution you'd like

I should be able to do the following with all libcu++ headers and features:

#include <cuda/header>

__global__ void kernel(...){
  NV_DISPATCH_TARGET(
    NV_IS_EXACTLY_SM_60,   ( do_sm60_thing(); ),
    NV_PROVIDES_SM_70, ( do_sm70_thing(); ),
    NV_PROVIDES_SM_90, ( do_sm90_thing(); )
  )
}

Tasks

  1. 0 of 2

Describe alternatives you've considered

If libcu++ doesn't do this, then I am forced to use lower level things like atomicAdd() or inline PTX.

Additional context

Related issues:
#997
#1082
#624

@jrhemstad
Copy link
Collaborator Author

jrhemstad commented Nov 10, 2023

Note that the status quo has meant we can't even use <cuda/atomic> in CUB or Thrust (see #515 #516)

If we can't use it in our own libraries, how do we expect other people to use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

1 participant