-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[App]: hardware logging in multi-node setting #7470
Comments
Hi @BramVanroy thank you for reporting this. May I please ask some more context, what's your current compute infra? and which ML frameworks are you mostly using? |
Hi @BramVanroy just following up on this, to see if you could provide us with some additional information on your current multinode infrastructure so as to include those in a feature request for our engineers? thank you! |
Hi Thanos I am running jobs on between 1 node, 1 GPU up to 10 nodes, 4 GPUs each. It seems to me that wandb does not correctly log hardware when it comes to multi-node settings. |
Perfect, thank you @BramVanroy for the additional context. I was wondering what's reported in those runs, if you navigate in |
Correct. It only reports the main node hardware configuration, but not the whole pool. |
Great, thank you @BramVanroy for the clarification. I have logged this feature request with our engineers, and we will keep you updated here on any progress. |
Current Behavior
Currently, in th e run overview, we can get an idea of the system hardware, specifically GPU count and CPU count. However, as far as I can tell this does not account for multi-node settings and only reports what the current node is equipped with. While I understand why that is the case, it may be confusing because it is not "correct".
Expected Behavior
Correct hardware information. To be honest I am not sure how feasible it is to collect this information without integration with distributed communication frameworks or something else custom.
Steps To Reproduce
No response
Screenshots
No response
Environment
OS: Linux
Browsers: Edge
Additional Context
No response
The text was updated successfully, but these errors were encountered: