New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vgpu not restricting memory in the container #3384
Comments
/assign @archlitchi |
Hey @archlitchi, Can you suggest something for this? |
could you provide the following information:
|
NodeSelector and tolerations are private, therefore can't show them here. Let me know if these properties can also affect the behavior of vgpu |
Could you provide the 'env' result inside container? |
wont be able to copy the complete output. If you are looking for a particular property, i should be able to get that for you |
okay, please list env which contains keyword 'CUDA' or 'NVIDIA' |
Did not print output of NVIDIA_REQUIRE_CUDA because its too long to type. Please bear with me
|
em.... IS this container the one in the yaml file? you allocated 3G in your yaml, but here it only gets 200M, besides, this is probably a cuda image, not a typical ubuntu:18.04 |
sorry, i ran a different yaml, everything else is same except memory is 200m, updated the earlier comment as well |
Please check if the following file exists inside container, AND the size of each file does NOT equal to 0:
|
/usr/local/vgpu/libvgpu.so -> Exists with non 0 size |
okay, i got it, please use the following image volcanosh/volcano-vgpu-device-plugin:dev-vgpu-1219 instead in volcano-vgpu-device-plugin.yml |
okay, let me try this!!! |
Hey @archlitchi, The mentioned error is on the same image(volcanosh/volcano-vgpu-device-plugin:dev-vgpu-1219). It was deployed a month ago, has anything changed since then? |
Hey @archlitchi, any other suggestions to fix this? |
Hi @archlitchi , i am also facing same issue with volcano vGPU feature. Could you guide me enable this feature. Thanks in advance. |
ok, i'm looking into it now, sorry i didn't see your replies last two weeks |
@archlitchi is the usage same for vgpu-memory and vgpu-number configurations? |
Is this device plugin compatible with volcano 1.8.2 release package. I deployed the device plugin |
yes, can you run your task now? |
The vgpu-device-plugin mounts your hostPath "/tmp/vgpu/containers/{containerUID}_{ctrName}" into containerPath "/tmp/vgpu" please check if the corresponding hostPath exists |
VolumeMounts: like above are the volumes configured in device-plugin daemon. |
@EswarS No, i mean, after you submit a vgpu task into volcano, please check
|
What happened:
When running the vgpu example provided in the docs. When vgpu memory limit is set, the container does not respect this limit as shown by the nvidia-smi command(32GB memory is shown as output for V100)
What you expected to happen:
The memory inside the container should be limited to vgpu-memory configuration.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Nvidia-smi version: 545.23.08
MIG M: NA
Environment:
kubectl version
): v1.28.xuname -a
):The text was updated successfully, but these errors were encountered: