Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed CUDA randint generation for large ranges. #126066

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tringwald
Copy link
Collaborator

Fixes #125224

For large ranges, calls to CUDA randint use a different unroll_factor to generate random ints. This unroll_factor was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see #125224).

@tringwald tringwald requested a review from eqy as a code owner May 13, 2024 13:38
Copy link

pytorch-bot bot commented May 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ada1975 with merge base 4f1a56c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_cuda.py Outdated Show resolved Hide resolved
test/test_cuda.py Outdated Show resolved Hide resolved
aten/src/ATen/native/cuda/DistributionTemplates.h Outdated Show resolved Hide resolved
@tringwald
Copy link
Collaborator Author

@r-barnes Thanks for reviewing, I added some type annotations and changed the C++ parameters to const.

@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 3ea6988 to 849bf9e Compare May 13, 2024 21:26
test/test_cuda.py Outdated Show resolved Hide resolved
@tringwald
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda-randint-randomness-for-large-range onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-randint-randomness-for-large-range && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the cuda-randint-randomness-for-large-range branch from b09c3f1 to cb7925c Compare May 14, 2024 07:36
@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch 3 times, most recently from 9c3ec13 to 303b76e Compare May 17, 2024 15:43
@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 303b76e to 0a7226b Compare May 18, 2024 20:05
@eqy
Copy link
Collaborator

eqy commented May 19, 2024

CC @drisspg who might know more about the SDPA tests

@tringwald
Copy link
Collaborator Author

tringwald commented May 19, 2024

Thanks @eqy. Those tests in test_transformers.py use torch._fill_mem_eff_dropout_mask_, which in turn calls a custom CUDA kernel to populate the dropout mask with uniform values before thresholding. I'm not sure why we don't use torch.rand there, but it seems like replacing the custom impl with torch.rand yields some weird test failures.
I've rolled back the test changes for now, so I can more easily debug the other failures, but we should probably reconsider if we need a custom rand impl for those tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strange behavior of randint using device=cuda
5 participants