New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Toolkit 12.4.0 tuple
incompatibility
#3690
Comments
Just to confirm your suspicion that this affects cross-platform builds, getting the same errors on Linux with GCC 13:
^ one such error |
I have the same issue when building latest OpenCV 4 from source with Cuda 12.4,, cudnn 9 and gcc 13, on Fedora 39
|
Having the same Issue when building latest OpenCV 4 from Source on Windows 11. |
I agree, this should be fixable the way you describe it. However:
Here one instance is compiled with
This seems to work for the case mentioned above. I am not sure however, if this will give correct result in all cases. Maybe someone can give some feedback? Or any ideas how this could be solved more elegantly? |
Alternatively, Thrust's // placed at the end of modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp, in the global namespace
_LIBCUDACXX_BEGIN_NAMESPACE_STD
template<class Ptr0, class Ptr1>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1>>> : tuple_size<tuple<Ptr0, Ptr1>> {};
template<class Ptr0, class Ptr1, class Ptr2>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2>> {};
template<class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};
template<class Ptr0, class Ptr1>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1>>> : tuple_size<tuple<Ptr0, Ptr1>> {};
template<class Ptr0, class Ptr1, class Ptr2>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2>> {};
template<class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};
template<size_t N, class Ptr0, class Ptr1>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1>>> : tuple_element<N, tuple<Ptr0, Ptr1>> {};
template<size_t N, class Ptr0, class Ptr1, class Ptr2>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_element<N, tuple<Ptr0, Ptr1, Ptr2>> {};
template<size_t N, class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_element<N, tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};
_LIBCUDACXX_END_NAMESPACE_STD Thrust does this for backwards compatibility with the old style of tuples as well. It also appears that In addition to the parameter packing changes mentioned above, I've successfully compiled OpenCV using this method. |
Also limit cuda interaction to ABI_X86_64. Bug: opencv/opencv_contrib#3690 Signed-off-by: Paul Zander <negril.nx+gentoo@gmail.com>
Also limit cuda interaction to ABI_X86_64. Bug: opencv/opencv_contrib#3690 Signed-off-by: Paul Zander <negril.nx+gentoo@gmail.com> Closes: #36020 Signed-off-by: Joonas Niilola <juippis@gentoo.org>
I am on of the maintainers of the cccl libraries at NVIDIA. We recently updated our old This has been fixed after this issue was raised here. There are different potential ways of working around this issue in the near / mid term:
|
how to replace?pull and cmake? which the cmake parameters? |
You could use CPM like:
|
Well... Still NOT quite get it... Do we have the solution already??? Have cccl built and replaced with the default ones installed with CUDA-Toolkit 12.4?? Thanks |
I was able to build the library using CUDA Toolkit 12.3.2 in my environment(through vcpkg). This is one way to use it. Also, the above cccl fixes seem to be going into v2.4.0. |
CUDA Toolkit 12.5 still has the bug. |
System information (version)
Detailed description
opencv with CUDA support cannot be built using CUDA Toolkit 12.4.0.
While CUDA Toolkit 12.3.2 uses thrust version 2.2.0 (https://docs.nvidia.com/cuda/archive/12.3.2/cuda-toolkit-release-notes/index.html), CUDA Toolkit 12.4.0 updates to thrust version 2.3.1 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). In thrust version 2.3.0, the tuple implementation was replaced with a standard tuple implementaton (NVIDIA/cccl#262). Notably, this changes the definition from a 10-parameter template to a variable-parameter template. So instead of a tuple of n items being padded out with 10 - n null types to always have 10 template parameters, it now only has n template parameters. This makes the function templates in cudev specified with 10 template parameters per tuple no longer viable for tuples not of size 10.
An example of one such function template that's no longer viable,
cv::cudev::blockReduce
:opencv_contrib/modules/cudev/include/opencv2/cudev/block/reduce.hpp
Lines 68 to 81 in 6b5142f
An example of an error I encounter:
The first candidate but nonviable function template shown in the error message is the one linked above, which was viable and selected in previous CUDA Toolkit versions.
I think that all templates specifying 10 template parameters per tuple can be updated to work with the new tuple definition by replacing each set of 10 template parameters with a parameter pack. I think this should still be compatible with the old tuple definition, as well. For example, I think this would be a viable implementation of
cv::cudev::blockReduce
:Steps to reproduce
Attempt to build cudev using CUDA Toolkit 12.4.0. I suspect that this error will be observed with any combination of OpenCV version, OS, platform, and compiler (that are modern enough to not encounter some other error first).
Issue submission checklist
forum.opencv.org, Stack Overflow, etc and have not found any solution
The text was updated successfully, but these errors were encountered: