Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs for GPU based ot.sinkhorn, and very slow speed for GPU based ot.sinkhorn2 #420

Open
HelloWorldLTY opened this issue Dec 18, 2022 · 1 comment

Comments

@HelloWorldLTY
Copy link

Hi, I met a bug when I intend to use GPU to calculate sinkhorn OT:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Also I will meet this bug after running ot.sinkhorn2 for a while.

The errors come from my inside network codes, which is caused by choosing ot.sinkorn. If I choose ot.emd, then I will not meet such an error. Could you please help me? Thanks.

@rflamary
Copy link
Collaborator

Hello @HelloWorldLTY thank for the issue.

GPU bug are notoriously hard to debug so we need a wroking (or mor eprecisely not working) exmaple to reproduce it and understand what is happening. I know it is probably hidden inside a NN loss during traianing but could you try to give us more information and an example of when it fails?

Note that we tetsed the function on GPU and the computational gain of GPU is indeed null/limited for small problems as shwon in this page of the documentation
https://pythonot.github.io/gen_modules/ot.backend.html#module-ot.backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants