Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate node quotas with the gcloud compute API #335

Open
willgraf opened this issue Apr 28, 2020 · 1 comment
Open

Validate node quotas with the gcloud compute API #335

willgraf opened this issue Apr 28, 2020 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@willgraf
Copy link
Contributor

Is your feature request related to a problem? Please describe.
We don't do much validation on the GPU inputs, and if users put in a maximum GPU count that is more than their quota, it can be difficult to understand why the GPUs will not scale. Maybe if they select the wrong type of GPU (one that they do not have any quota enabled for) their GPU could not come up at all! Additionally, users may not have a large enough node quota, which could cause unexpected failures if the cluster cannot scale up.

Describe the solution you'd like
The GPU quota information can be found via:

gcloud compute regions describe $CLOUDSDK_COMPUTE_REGION

We should use the output of it to validate the user's menu inputs.

Additional context
There are separate quotas for PREEMPTIBLE and regular GPUs, which made me realize we probably just use preemptible for all clusters.

@willgraf willgraf added the enhancement New feature or request label Apr 28, 2020
@willgraf
Copy link
Contributor Author

willgraf commented Oct 5, 2020

Additionally, as documented in #367, multi-zone clusters with GPU_NODE_MIN_SIZE of 1 must have a GPU quota of at least 2. Validating the GPU quota would be the best way to prevent this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant