Fixed CUDA randint generation for large ranges. #126066

tringwald · 2024-05-13T13:38:55Z

For large ranges, calls to CUDA randint use a different unroll_factor to generate random ints. This unroll_factor was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see #125224).

pytorch-bot · 2024-05-13T13:38:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ada1975 with merge base 4f1a56c ():

NEW FAILURE - The following job has failed:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_cuda.py

aten/src/ATen/native/cuda/DistributionTemplates.h

tringwald · 2024-05-13T14:26:12Z

@r-barnes Thanks for reviewing, I added some type annotations and changed the C++ parameters to const.

test/test_cuda.py

tringwald · 2024-05-14T07:35:03Z

@pytorchbot rebase

pytorchmergebot · 2024-05-14T07:36:35Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-05-14T07:36:40Z

Successfully rebased cuda-randint-randomness-for-large-range onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-randint-randomness-for-large-range && git pull --rebase)

…ts with torch.rand.

eqy · 2024-05-19T19:17:02Z

CC @drisspg who might know more about the SDPA tests

…on with ceil.

tringwald · 2024-05-19T20:54:10Z

Thanks @eqy. Those tests in test_transformers.py use torch._fill_mem_eff_dropout_mask_, which in turn calls a custom CUDA kernel to populate the dropout mask with uniform values before thresholding. I'm not sure why we don't use torch.rand there, but it seems like replacing the custom impl with torch.rand yields some weird test failures.
I've rolled back the test changes for now, so I can more easily debug the other failures, but we should probably reconsider if we need a custom rand impl for those tests.

tringwald requested a review from eqy as a code owner May 13, 2024 13:38

pytorchbot added the open source label May 13, 2024

r-barnes reviewed May 13, 2024

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

test/test_cuda.py Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/DistributionTemplates.h Outdated Show resolved Hide resolved

tringwald mentioned this pull request May 13, 2024

Strange behavior of randint using device=cuda #125224

Open

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 3ea6988 to 849bf9e Compare May 13, 2024 21:26

eqy reviewed May 14, 2024

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

eqy approved these changes May 14, 2024

View reviewed changes

pytorchmergebot force-pushed the cuda-randint-randomness-for-large-range branch from b09c3f1 to cb7925c Compare May 14, 2024 07:36

tringwald force-pushed the cuda-randint-randomness-for-large-range branch 3 times, most recently from 9c3ec13 to 303b76e Compare May 17, 2024 15:43

Fixed CUDA randint generation for large ranges.

0a7226b

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 303b76e to 0a7226b Compare May 18, 2024 20:05

tringwald added 2 commits May 18, 2024 22:20

Fixed offset calculation.

f3056dc

Replaced call to torch._fill_mem_eff_dropout_mask_ in transformer tes…

95a2faa

…ts with torch.rand.

Rolled back transformer test changes. Using original offset calculati…

ada1975

…on with ceil.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed CUDA randint generation for large ranges. #126066

Fixed CUDA randint generation for large ranges. #126066

tringwald commented May 13, 2024

pytorch-bot bot commented May 13, 2024 •

edited

tringwald commented May 13, 2024

tringwald commented May 14, 2024

pytorchmergebot commented May 14, 2024

pytorchmergebot commented May 14, 2024

eqy commented May 19, 2024

tringwald commented May 19, 2024 •

edited

Fixed CUDA randint generation for large ranges. #126066

Are you sure you want to change the base?

Fixed CUDA randint generation for large ranges. #126066

Conversation

tringwald commented May 13, 2024

pytorch-bot bot commented May 13, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

❌ 1 New Failure

tringwald commented May 13, 2024

tringwald commented May 14, 2024

pytorchmergebot commented May 14, 2024

pytorchmergebot commented May 14, 2024

eqy commented May 19, 2024

tringwald commented May 19, 2024 • edited

pytorch-bot bot commented May 13, 2024 •

edited

tringwald commented May 19, 2024 •

edited