Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCU cg single precision bug: Unexpected hipBLAS Error #4143

Closed
16 tasks
pxlxingliang opened this issue May 10, 2024 · 0 comments · Fixed by #4201
Closed
16 tasks

DCU cg single precision bug: Unexpected hipBLAS Error #4143

pxlxingliang opened this issue May 10, 2024 · 0 comments · Fixed by #4201
Assignees
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues

Comments

@pxlxingliang
Copy link
Collaborator

Describe the bug

I run below job with dcu abacus with precision=single.

WARNING: Total thread number on this node mismatches with hardware availability. This may cause poor performance.
Info: Local MPI proc number: 4,OpenMP thread number: 1,Total thread number: 4,Local thread limit: 32
Unexpected hipBLAS Error: Unknown /public/home/abacus/abacus-develop/source/module_hsolver/kernels/rocm/math_kernel_op.hip.cu 723
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[35024,1],0]
  Exit code:    10
--------------------------------------------------------------------------

single (2).zip

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants