Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some miniapps fail with HIP 5.6 #1049

Open
msimberg opened this issue Nov 22, 2023 · 0 comments
Open

Some miniapps fail with HIP 5.6 #1049

msimberg opened this issue Nov 22, 2023 · 0 comments
Labels

Comments

@msimberg
Copy link
Collaborator

While investigating the slowdowns from newer HIP versions I found that the eigensolver miniapp fails immediately on the first iteration (consistently) with:

terminate called after throwing an instance of 'whip::exception'
  what():  invalid argument
srun: error: nid006104: task 0: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=4970716.56
slurmstepd: error: *** STEP 4970716.56 ON nid006104 CANCELLED AT 2023-11-20T22:16:08 ***
srun: error: nid006104: tasks 1-7: Terminated
srun: Force Terminated StepId=4970716.56

I have not investigated this at all. Some miniapps are clearly fine (miniapp_bt_band_to_tridiag) while others fail for some configurations or always (miniapp_bt_reduction_to_band, miniapp_band_to_tridiag). Others were not tested at this point. The core dumps contain nothing useful, so this will need more thorough investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant