Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create CPU grid patterns, that behave exactly as the current GPU grid patterns implemented, in terms of thread indexes. #86

Open
morousg opened this issue Dec 28, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@morousg
Copy link
Owner

morousg commented Dec 28, 2023

The reasoning behind this is the following:

  1. In some cases, we need the same functionality we have in GPU to be executed in CPU, to reduce PCIe transfers.
  2. We are implementing some GPU functionality that is not available on OpenCV, therefore there is no OpenCV-CPU equivalent.
  3. Having exactly the same thread indexes in CPU as in the GPU, can ease debugging in some situations, since GPU debugging does not keep all the values for all the variables.

So, we simply what to create CPU based GPU-GridPattern emulators.

Evaluate the usage of std::async to parallelize the for loops, to get better performance. Since we are doing kernel fusion, we will not have the issue of having to keep some threads alive all the time. The idea is to fuse all the operations the user needs into a single operation, and then the thread creation overhead will be the justified. Similarly as with OpenMP. Additionally, some implementations of std::async can actually care about the number of CPU's in the system and properly handle a thread pool.

This issue will define a rule on which functions will be __host__, which ones will be __device__ and which ones will be __host __ __device __.

__host __ : It will not change much. Basically, any host code. There is no need to use it, unless you want to explicitly indicate to differentiate with other closely related code that might be __host __ __device __ or __device __. For instance, it could be used to decorate CPU GridPatterns, but it's not necessary.
__device __ : Mainly, only the GPU GridPatterns should be exclusively __device __.
__host __ __device __ : this should be applied to every single Operation, and every single Metaprogramming utility.

@morousg morousg added the enhancement New feature or request label Dec 28, 2023
@morousg morousg self-assigned this Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant