Enable building C++ code with GPU (CUDA) support #908
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR add support for CUDA based GPU computing.
The following thoughts went into the PR:
NETWORKIT_CUDA
. If CUDAToolkit is found, all.cu
files are automatically compiled withnvcc
. Maybe we can also supportclang -x cuda
in the future? If toolkit is not found,.cu
files are marked as C++ language files in order to get compiled. Another possibility would be to track the files, which need to be compiled with eithernvcc
orclang -x cuda
and to keep the normal.cpp
file ending.setup.py
only uses simple check to decide, whether to parseNETWORKIT_CUDA
to CMake or not. It searches fornvidia-smi
tool, which is installed on Linux + Windows if the official drivers are used. The check is triggered, when passingenable-gpu
tobuild_ext
. Maybe that is one check too much and we can just rely on autodetection of CUDA GPUs.__CUDACC__
macro. All CUDA specific code should be wrapped in a#ifdef
-check in order to compile also on systems without GPU (similar to AVX2). Even though some CUDA specific macros like__global__
,__host__
and so on can be also wrapped in another macro (similar toNETWORKIT_EXPORT
for Windows builds), there are certain common variables (likethreadIdx
), which would make a ifdef wrap necessary for kernel functions anyway.Aux::GPUTools
for wrapping CUDA calls and management of GPU devices. This detaches GPU handling from algorithm implementation and removes the need for cuda-headers for header files of algorithms(see temp.. Can also help with runtime detection of GPUs and support detaching Cython interface from GPU code. Otherwise we either have to compile all Cython modules with at least one GPU class withToyCentrality.hpp/cu
example)nvcc
/clang -x cuda
or add an additional module for all GPU-enabled classes.ToyCentrality.hpp/cu
show usage of preprocessor and runtime checking for a simple GPU kernel. The kernel function has to be outside the class-definition in order to work properly. Tested on both CPU and GPU hardware.What is missing (but likely out of scope of this PR):