Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROMACS 2024.1 test fails due to time outs on AMD-ZEN2 . #20323

Closed
satishskamath opened this issue Apr 9, 2024 · 8 comments
Closed

GROMACS 2024.1 test fails due to time outs on AMD-ZEN2 . #20323

satishskamath opened this issue Apr 9, 2024 · 8 comments
Milestone

Comments

@satishskamath
Copy link
Contributor

A link to the issue: https://gitlab.com/gromacs/gromacs/-/issues/5062

@ocaisa
Copy link
Member

ocaisa commented Apr 9, 2024

Are you using OpenMPI with the patch that appeared in 4.9.1?

@ocaisa
Copy link
Member

ocaisa commented Apr 9, 2024

And/or are you disabling libfabric? Not sure which, but one or both of these seemed to have resolved similar issues for EESSI

@satishskamath
Copy link
Contributor Author

@ocaisa I am trying to build it with foss 2023a and I am definitely using the patches because I adopted the easyconfigs from #20102 . I will try disabling libfabric completely and check again.

@boegel
Copy link
Member

boegel commented Apr 30, 2024

@satishskamath Any updates on this?

@boegel boegel added this to the 4.x milestone Apr 30, 2024
@satishskamath
Copy link
Contributor Author

I have the run going with twice the timeout limit. I will update once it finishes here.

@satishskamath
Copy link
Contributor Author

I ran it with 4 times the default tolerance but still 1 test fails due to time out.

The following tests FAILED:
	 65 - MdrunIOTests (Timeout)

So I presume there is more going on. Now I will try to turn the OFI provider completely off.
@boegel Is there a way to pass an environment variable into easybuild build and test environment via the easyconfig ?

@satishskamath
Copy link
Contributor Author

So all these tests are passing right now. But it is taking quite some time for the gmxapi to build that could be related to the file system race. I have included the patch from your earlier PR and now am building it again on Snellius. If that works out this ticket can be closed.

The timeouts also seem to be a local system problem which needs to diagnosed.

@satishskamath
Copy link
Contributor Author

Issue resolved. Thanks @ocaisa and @boegel .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants