Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: study and implement a way to limit Pod's i915 count #1408

Open
Tracked by #28
tkatila opened this issue May 5, 2023 · 25 comments
Open
Tracked by #28

gpu: study and implement a way to limit Pod's i915 count #1408

tkatila opened this issue May 5, 2023 · 25 comments

Comments

@tkatila
Copy link
Contributor

tkatila commented May 5, 2023

In #1377 it was identified that it would be required in some clusters to limit the Pod's i915 resource count to 1 (or some other value). The idea is to allow setting shared-dev-num to >1 and to prevent users from accessing more GPU resource than designed.

Webhook might be a good way to implement this, but it would be good to study other solutions as well.

@eero-t
Copy link
Contributor

eero-t commented May 5, 2023

K8s supports limit ranges: https://kubernetes.io/docs/concepts/policy/limit-range/

@uniemimu thought that it should support extended resources in addition to core ones, but somebody needs to check (also) whether it actually works when one omits default and min values (i.e. whether it allows pods that do not request a GPU):

apiVersion: v1
kind: LimitRange
metadata:
  name: gpu-count-constraint
spec:
  limits:
  - max:
      gpu.intel.com/i915: 1
    type: Container

@uMartinXu can you try whether that does what you wanted?

@vbedida79
Copy link

Hi @eero-t, tried limitrange. supports extended resources.
It does not accept pods with gpu.intel.com/i915 > 1.
Warning FailedCreate 2m55s job-controller Error creating: pods "intel-dgpu-clinfo-gphs2" is forbidden: maximum gpu.intel.com/i915 usage per Container is 1, but limit is 3
In case of a pods that do not request gpu resource- whether scheduled on GPU node or others, the pod does allocate i915 resource requests and limits:

LimitRanger plugin set: gpu.intel.com/i915 request for container httpd;
      gpu.intel.com/i915 limit for container httpd
containers:
    - resources:
        limits:
          gpu.intel.com/i915: '1'
        requests:
          gpu.intel.com/i915: '1'

@eero-t
Copy link
Contributor

eero-t commented May 9, 2023

Thanks for testing, good to hear that (max) limiting part works!

Will it still try to add GPU resource requests for non-GPU pods, if you add to LimitRange:

min:
      gpu.intel.com/i915: 0

?

@vbedida79
Copy link

vbedida79 commented May 9, 2023

Yes, added min limit. It still adds request and limit i915 resource to non-GPU pods.

I understand its cause of the default limit and default request, is that right? Irrespective of what default value is set, it does add that to non GPU pods.

    default: 
      gpu.intel.com/i915: 1
    defaultRequest:
      gpu.intel.com/i915: 1

@eero-t
Copy link
Contributor

eero-t commented May 9, 2023

Do you mean that even if you specify gpu.intel.com/i915: 0 default, non-GPU pods get gpu.intel.com/i915: 1?

That sounds like bug which should be filed to upstream kubernetes. Either as "LimitRange does not respect specified default value", or "LimitRange adds resource request even when default and min resource requests are specified as zero".

@vbedida79
Copy link

If default values are added to limitrange it is added to non GPU pods. If no default values are added to limit range, it allocates below values to non GPU pods. Is this expected?

- resources:
        limits:
          gpu.intel.com/i915: '1'
        requests:
          gpu.intel.com/i915: '1'

@eero-t
Copy link
Contributor

eero-t commented May 9, 2023

If default values are added to limitrange it is added to non GPU pods.

so if you specify default as gpu.intel.com/i915: 0, is the resource request zero for non-GPU pods?

@vbedida79
Copy link

Yes works as expected. If default is gpu.intel.com/i915: 0 it shows the same for non GPU pods

@vbedida79
Copy link

vbedida79 commented May 16, 2023

With max and min in limitrange, for GPU pods requesting resource > 1, it has a forbidden error.
If default values are not added in limirange, it ends up adding max values as limit and request to non GPU pods as well.
After adding default limit and request both 0- for non GPU pods, it just shows up as gpu.intel.com/i915: 0 in the spec.

@eero-t
Copy link
Contributor

eero-t commented May 17, 2023

After adding default limit and request both 0- for non GPU pods, it just shows up as gpu.intel.com/i915: 0 in the spec.

@uniemimu You've been looking more into scheduler. Do you see any practical problem with zero extended resource request being added to non-GPU processes 8by limitRange)?

I.e. does it just look funny, but work fine in practice?

@vbedida79
Copy link

Also, as limitrange is namespace scoped- we only add it in the namespace where workloads are deployed right?

@eero-t
Copy link
Contributor

eero-t commented May 22, 2023

Yes, to namespaces where your cluster config RBAC rules allow given k8s users access to GPU resources.

@vbedida79
Copy link

Got it, thanks.

I.e. does it just look funny, but work fine in practice?

Apart from this, can we assume limitrange as an efficient solution? Then we could add the yaml in the project 1.0.0 GA as deployment step after GPU deviceplugin is created- for publishing certified operator on OCP 4.12.

@eero-t
Copy link
Contributor

eero-t commented May 22, 2023

I'm also assuming this would be for operator configuration option for whether multi-GPU jobs are allowed.

But does operator component know which namespaces in given cluster are allowed to access GPU resources? Aren't such RBAC rules rather specific to cluster?

@vbedida79
Copy link

For openshift, there might be some roles to not deploy workloads/objects in "openshift-" namespaces created during the cluster. Will check

@vbedida79
Copy link

On openshift these namespaces: default, kube-system, kube-public, openshift-node, openshift-infra, openshift do not allow assigning SCC, so its recommended not deploy pods in these namespaces

@vbedida79
Copy link

Any other possible solutions you would suggest to try apart from limitrange?

@eero-t
Copy link
Contributor

eero-t commented May 31, 2023

While writing separate Webhook for this is a possibility, they are nasty, and LimitRange already seems to be explicitly designed for this. It just needs to support zero minimum value better (not add request for zero resource).

Or do you think it should have also an option for limit being cluster wide instead of namespace specific?

@vbedida79
Copy link

vbedida79 commented Jun 1, 2023

Thanks, agree with LimitRange. Workloads can be deployed to a specific namespace for individual GPU access. @uMartinXu any thoughts?
For supporting 0 minimum value better, along with LimitRange can ResourceQuota be a good option? If it does, it cant accept pods which don't have any gpu.intel.com/i915 request/limit in their pod spec. So the namespace should only accept GPU pods. Not sure if it supports extended resources though.

@uMartinXu
Copy link

The limitation of the i915 to 1 should be enforced on the whole cluster scope and should not only be applied to the specific namespace.

@eero-t
Copy link
Contributor

eero-t commented Jun 8, 2023

There are few alternatives to achieve whole cluster wide GPU count limits:

  • Allow GPU resources only for namespaces that have suitable LimitRange limits in place, using RBAC rules: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
  • Improve LimitRange to have option for applying limit to all namespaces
  • Add option to GAS for specifying allowed range for GPU resource requests
  • Add option to GPU plugin for rejecting GPU (count) requests outside of given range

@uniemimu, @tkatila Any comments on these?

@tkatila
Copy link
Contributor Author

tkatila commented Jun 9, 2023

  • Add option to GAS for specifying allowed range for GPU resource requests

GAS would be a possible place for limiting the i915 resource requests, but that would then require using GAS in general.

  • Add option to GPU plugin for rejecting GPU (count) requests outside of given range

I doubt that this is an option as it's quite late in the Pod scheduling flow. I tried returning an error for the Allocate() and scheduler just kept retrying leaving multiple UnexpectedAdmissionError pods behind. Though the documentation indicates that it's possible somehow more peacefully to return an error.

@eero-t
Copy link
Contributor

eero-t commented Jun 9, 2023

  • Improve LimitRange to have option for applying limit to all namespaces

According to Stackoverflow, this can already be done by using Kyverno: https://stackoverflow.com/questions/73488971/how-can-i-apply-limit-range-to-all-namespaces-in-kubernetes

Whis is "a policy engine to validate, mutate, generate, and cleanup Kubernetes resources, and verify image signatures and artifacts to help secure the software supply chain".

(LimitRange should really be first improved to not to add zero resource requests to every pod though.)

@eero-t
Copy link
Contributor

eero-t commented Dec 11, 2023

I just noticed that ResourceQuota supports also external resources: https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources

That could also be experimented with, whether it works any better than LimitRange for limiting GPU usage to only specific namespaces.

@mythi
Copy link
Contributor

mythi commented Dec 11, 2023

Related are the old backlog items #598 and #486

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants