-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add debugging output to spack and ramble installations #2568
Merged
cdunbar13
merged 2 commits into
GoogleCloudPlatform:develop
from
cdunbar13:ramble_spack_update
May 15, 2024
Merged
Add debugging output to spack and ramble installations #2568
cdunbar13
merged 2 commits into
GoogleCloudPlatform:develop
from
cdunbar13:ramble_spack_update
May 15, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cdunbar13
added
the
release-module-improvements
Added to release notes under the "Module Improvements" heading.
label
May 10, 2024
nick-stroud
requested changes
May 10, 2024
community/modules/scripts/spack-setup/templates/spack_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
nick-stroud
requested changes
May 14, 2024
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
community/modules/scripts/ramble-setup/templates/ramble_setup.yml.tftpl
Outdated
Show resolved
Hide resolved
cdunbar13
force-pushed
the
ramble_spack_update
branch
from
May 14, 2024 12:10
51c01ed
to
e00f790
Compare
nick-stroud
approved these changes
May 14, 2024
nick-stroud
approved these changes
May 14, 2024
cdunbar13
force-pushed
the
ramble_spack_update
branch
2 times, most recently
from
May 15, 2024 15:34
76f1bb7
to
8f5e9c8
Compare
…node was running install and had the lock
cdunbar13
force-pushed
the
ramble_spack_update
branch
from
May 15, 2024 15:37
8f5e9c8
to
144f925
Compare
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running the spack and ramble installations if something fails it's difficult to know which node to check for failure because it depends on which got the lock first. This is the first step to make debugging easier.
This update prints the hostname of the node that has the lock to the lock directory and if it fails it will print out the contents of the lock directory, which will then have the hostname of the node that failed.
After this PR is approved, the next step should be to get the stderr of any command that fails, write it to a file in the lock directory, then print the contents in the rescue block of the ansible playbook.
This was tested by deploying a blueprint that uses spack and ramble, waiting until a node had gotten the lock, then suspending that node, forcing the other nodes to timeout and print the new debug messages.
Failed output looks like: