Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAJA CUDA illegal memory access for the ConvDiff test #175

Open
ctian282 opened this issue Aug 25, 2021 · 0 comments
Open

RAJA CUDA illegal memory access for the ConvDiff test #175

ctian282 opened this issue Aug 25, 2021 · 0 comments

Comments

@ctian282
Copy link

ctian282 commented Aug 25, 2021

After I successfully compile with RAJA and CUDA, the SAMRAI does not pass many tests, including the ConvDiff test, which shows error message

  1/953 Test   #1: blt_gtest_smoke .......................................................   Passed    0.00 sec
        Start   2: blt_fruit_smoke
  2/953 Test   #2: blt_fruit_smoke .......................................................   Passed    0.00 sec
        Start   3: blt_openmp_smoke
  3/953 Test   #3: blt_openmp_smoke ......................................................   Passed    0.00 sec
        Start   4: blt_mpi_smoke
  4/953 Test   #4: blt_mpi_smoke .........................................................   Passed    0.36 sec
        Start   5: blt_cuda_smoke
  5/953 Test   #5: blt_cuda_smoke ........................................................   Passed    0.21 sec
        Start   6: blt_cuda_runtime_smoke
  6/953 Test   #6: blt_cuda_runtime_smoke ................................................   Passed    0.04 sec
        Start   7: blt_cuda_openmp_smoke
  7/953 Test   #7: blt_cuda_openmp_smoke .................................................   Passed    0.24 sec
        Start   8: blt_cuda_mpi_smoke
  8/953 Test   #8: blt_cuda_mpi_smoke ....................................................   Passed    0.99 sec
        Start   9: convdiff_test_test.2d.input
9/953 Test   #9: convdiff_test_test.2d.input ...........................................***Failed    2.74 sec
CUDAassert: an illegal memory access was encountered /usr/include/RAJA/policy/cuda/synchronize.hpp 42
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDAassert
[compute1-exec-204:23351] *** Process received signal ***
[compute1-exec-204:23351] Signal: Aborted (6)
[compute1-exec-204:23351] Signal code:  (-6)
[compute1-exec-204:23351] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f76ad17a980]
[compute1-exec-204:23351] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f76abe68fb7]
[compute1-exec-204:23351] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f76abe6a921]
[compute1-exec-204:23351] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f76aca8c957]
[compute1-exec-204:23351] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ae6)[0x7f76aca92ae6]
[compute1-exec-204:23351] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b21)[0x7f76aca92b21]
[compute1-exec-204:23351] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d54)[0x7f76aca92d54]
[compute1-exec-204:23351] [ 7] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_Z19RAJA_ABORT_OR_THROWPKc+0x64)[0x55d43f912799]
[compute1-exec-204:23351] [ 8] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA10cudaAssertE9cudaErrorPKcib+0x67)[0x55d43f91284e]
[compute1-exec-204:23351] [ 9] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA11synchronizeINS_6policy4cuda16cuda_synchronizeEEEvv+0x34)[0x55d43ff913c9]
[compute1-exec-204:23351] [10] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox11synchronizeINS0_6policy8parallelEEEvv+0x9)[0x55d43ff91307]
[compute1-exec-204:23351] [11] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox20parallel_synchronizeEv+0x34)[0x55d43ff9133e]
[compute1-exec-204:23351] [12] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZNK6SAMRAI4mesh17GriddingAlgorithm8fillTagsEiRKSt10shared_ptrINS_4hier10PatchLevelEEi+0x192)[0x55d43ffa2910]
[compute1-exec-204:23351] [13] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4mesh17GriddingAlgorithm17makeCoarsestLevelEd+0x131a)[0x55d43ff9757c]
[compute1-exec-204:23351] [14] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(main+0x1ca0)[0x55d43f95ec8b]
[compute1-exec-204:23351] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f76abe4bbf7]
[compute1-exec-204:23351] [16] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_start+0x2a)[0x55d43f903dea]
[compute1-exec-204:23351] *** End of error message ***

It seems this is caused by the code block

#if defined(HAVE_RAJA)
tbox::parallel_synchronize();
#endif

in GriddingAlgorithm::fillTags() when initializing with makeCoarsestLevel(). The RAJA version is v0.13.0 and the version recommended v0.12.1 has the same issue. I also passed all RAJA test under my environment.

@ctian282 ctian282 changed the title RAJA CUDA illegal memory access for the ConvDiff RAJA CUDA illegal memory access for the ConvDiff test Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant