-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel attributes and __launch__bounds__ Feature #1328
Comments
Related: #714 |
For 1: This will be handled automatically in the generic SSCP compiler soonish. No need for user-provided hints. |
I'm optimizing register comsumption of SYCL code for higher performance running on HIP Devices, GPUs' max registers allowed in a block usuallly achieves at 256*256. It's found that only the product threads of a block is an integer multiple of 256 coming to the theoretical performance peak of AMD Devices, that means the register comsumption of a kernel has to be less than 256. |
Expanding on that, it could be a useful sanity check for SSCP. For known devices / multipass compilation, developers can manually check the register pressure at compile time ( |
This is a bit difficult to do.
|
|
Describe the motivation for the feature request
As shown below:
1. how do i map launch_bounds to AdaptiveCpp interfaces for HIP Code Optimization ?
2. It's enbaled to check registers and whose spilling for Code Optimization in CUDA and HIP, is this feature accessiable using AdaptiveCpp interfaces ?
The text was updated successfully, but these errors were encountered: