Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for nvidia-container-toolkit and docker 19.03 #3143

Closed
haggalin opened this issue Sep 9, 2019 · 43 comments · May be fixed by #4791 or #6872
Closed

Support for nvidia-container-toolkit and docker 19.03 #3143

haggalin opened this issue Sep 9, 2019 · 43 comments · May be fixed by #4791 or #6872
Labels
Milestone

Comments

@haggalin
Copy link

haggalin commented Sep 9, 2019

From nvidia-container-toolkit github page:

Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.

I can't upgrade to docker 19.03 with the new nvidia-container-toolkit because adding gpus to the container is not supported from the portainer UI with this new toolkit.

With the new nvidia-container-toolkit the way to run containers with gpu access is:
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

With nvidia-docker2 it used to be:
docker run --runtime=nvidia nvidia/cuda:9.0-base nvidia-smi
which was easy to do from the portainer UI under the "Runtime & Resources" tab.

So the feature request is to support --gpus options from the portainer UI with the new nvidia-container-toolkit and docker 19.03.

An idea is to have it available under "Runtime & Resources" -> "Resources" tab where you can already configure other resource options.

Here are some examples of using the --gpus option taken from the nvidia-container-toolkit github page.

#### Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
$ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
$ docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi

Edit: There is a workaround which is listed at the bottom of the nvidia-container-toolkit github page s.t. using --runtime option still works. They note that this won't be supported in the future.

@ghost ghost added area/containers area/docker kind/enhancement Applied to Feature Requests labels Sep 9, 2019
@ghost ghost self-assigned this Sep 20, 2019
@zocker-160
Copy link

zocker-160 commented Oct 3, 2019

the --runtime nvidia option does only work, if you have nvidia-docker2 installed, but this package is deprecated and won't be updated in the future

so I second this request, GPUs won't be usable with Portainer otherwise

@deviantony deviantony added this to the next milestone Oct 14, 2019
@deviantony deviantony unassigned ghost Oct 17, 2019
@deviantony deviantony removed this from the next milestone Oct 17, 2019
@romansavrulin
Copy link

Hello! Any news on this?

@estimadarocha
Copy link

Hello! Any news on this?

any update?

@mimxrt
Copy link

mimxrt commented Feb 18, 2020

Are there any plans on implementing this at all? Please let us know - it is a crucial feature and the deprecated nvidia-docker2 is not going to work forever. We might have to look into alternatives. Quote:

Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.

(Source: https://github.com/NVIDIA/nvidia-docker)

@ghost ghost self-assigned this Feb 19, 2020
@ghost ghost added the priority/normal Core team priority label Feb 19, 2020
@ghost
Copy link

ghost commented Feb 19, 2020

Hey there everyone, in response to the interest on this I have this assigned to me as a normal priority task. Once I finish work on my current high priority task this week I will focus on this 👍

@piwi3910
Copy link

any news on this?

@RonB

This comment has been minimized.

@georgechang
Copy link

@RonB I believe the nvidia-container-runtime is still based on the nvidia-docker2 packages which is what was mentioned as being deprecated. What's needed for the new native integration is instead of setting nvidia as the runtime is to add flags for --gpus.

@RonB
Copy link

RonB commented Mar 29, 2020

@georgechang ok sorry, was just happy that I got it working. And I am definately not an expert.
Despite being deprecated, is there a reason not to use this, seems to work like a charm.

@KeenJelly
Copy link

Any update?

@ncresswell
Copy link
Member

ncresswell commented Apr 14, 2020 via email

@ne0ark
Copy link

ne0ark commented Jun 29, 2020

How are you guys handling this outside of portainer?

@zocker-160
Copy link

zocker-160 commented Jun 29, 2020

@ne0ark actually pretty simple, all you have to do is to add the --gpus all argument to your docker run command

@ne0ark
Copy link

ne0ark commented Jul 1, 2020

@ne0ark actually pretty simple, all you have to do is add the --gpus all argument to your docker run command

Thanks I just got it installed with Nvidia runtime. It’s much better route.

@mimxrt
Copy link

mimxrt commented Jul 2, 2020

@ne0ark Do you mean nvidia-docker2? How is it a better route? It works well but it will not be supported in the future.

Frankly, I don't understand why there is no option to manually add command line parameters to docker run commands in Portainer - it would be the most trivial solution. Anyway, just make sure you have a backup plan to using Portainer once nvidia-docker2 stops working!

@ncresswell
Copy link
Member

ncresswell commented Jul 2, 2020 via email

@deviantony deviantony added this to the backlog milestone Nov 24, 2020
@deviantony deviantony removed this from the backlog milestone Nov 25, 2020
@deviantony
Copy link
Member

Required changes

Container-creation view

Introduce a new input to support the GPUs option in the Runtime & Resources tab

Portainer - 2020-12-03T112836 719

Container-edition view

Support the same input as the one in the container creation view.

Container-details view

Introduce a new entry in the Container details showing the value associated to GPUs only if it was set when the container was created.

Portainer - 2020-12-03T113210 342

@chiptus
Copy link
Contributor

chiptus commented Dec 21, 2020

Implementation plan

  1. Add new fields to “create container form” and send the info to docker
  2. Parse and Add new fields to “edit container form” and send the info to docker
  3. Parse and show it in “container info”

The problem will be translating from portainer to docker API.

Docker API v1.40+ expects a field called “DeviceRequests” which is an array of objects. Each object has the shape of:

{
    Driver: "nvidia",
    Count: -1,
    DeviceIDs: ["0", "1", "GPU-fef8089b-4820-abfc-e83e-94318197576e"],
    Capabilities: [["gpu", "nvidia", "compute"]],
    Options: { property1: "string", property2: "string" },
  }

I’m not 100% sure how it needs to be filled. I expect that Options can be an empty object (or nil), caps should stay the same and DeviceIDs can be a list of ids or an array with one slot which is set to “all” or “none”.

If that’s the case then we can have a switch that enables this feature (otherwise we send DeviceRequests as an empty array. if enabled the user can switch between a list of device ids, or just all.

This is only supported from v1.40, so we should hide it for earlier versions

How can we test this? @deviantony @yi-portainer

@ncresswell
Copy link
Member

ncresswell commented Dec 21, 2020 via email

@deviantony
Copy link
Member

Feature request for the equivalent in Kubernetes: #4640

@estimadarocha
Copy link

is there any trick to see gpus on Container-edition view or Container-creation view?

@deviantony
Copy link
Member

@estimadarocha we have a work in progress on this via #4791

We're currently reviewing it but if you want to have a look at it you can use the following development/preview image: portainerci/portainer:pr4791

@estimadarocha
Copy link

I will try and see how it looks like... Thanks @deviantony

@deviantony
Copy link
Member

To everyone following this item, we have a first piece of work available for this via #4791

We'd be happy to get some feedback on it for those of you that can give it a go (this is a development build and should not be used in production environments): portainerci/portainer:pr4791

@bobarune
Copy link

bobarune commented Feb 22, 2021

Moved my plex install over to docker and decided to use Portainer to manage it. Once I got everything working with just CPU transcoding, I decided to install the nvidia drivers and the nvidia docker container toolkit to re-enable HW transcoding.

To support that, I moved my portainer docker instance to portainerci/portainer:pr4791 and wanted to report my findings. I was originally on the latest of 2.1.1. I only found a few issues, but it appears to be working. Creating the plex container on 2.1.1 may be the cause of one of the issues. Playing a 10bit H265 MKV in chrome triggers a transcode and it shows it is using hardware transcoding which is good.

The first issue happened when I attempted to use my pre-existing plex container. I turned on the "Enable GPU" under "Runtime & Resources". I had capabilities set to compute, utility, and video. Upon clicking "Deploy the container", I got an error that said "Cannot read property 'push' of null". I assume this error might be from re-using the container as once I re-created it, I was able to deploy without error. Realized this could be Chrome vs Firefox, but will be able to check later.

The second issue is just a display issue. The drop down for capabilities is visually under the Resources Memory bars. This appears to only be a Chrome issue as Firefox displays without issue. Chome: 89.0.4389.58
image

OS details:
OS: Ubuntu 20.04.2 LTS x86_64
Host: MS-7A34 1.0
Kernel: 5.4.0-65-generic
Uptime: 44 mins
Packages: 1189 (dpkg), 6 (snap)
Shell: bash 5.0.17
CPU: AMD Ryzen 7 1700 (16) @ 3.000GHz
GPU: NVIDIA GeForce GTX 1060 3GB
Memory: 685MiB / 32128MiB

@deviantony
Copy link
Member

Hi @bobarune, thanks for the feedback ! I've passed it to our development team via #4791 (comment) and we'll continue working on this.

@Skoal262
Copy link

Any updates?? I have been fallowing this for quite some time, my plex container stopped working. I had to disable the old nvidia runtime and without gpu support i have to rely on cpu transcoding.

@oramirite
Copy link

oramirite commented Feb 11, 2022

Hey everyone. Happy to find this work-in-progress version of GPU compatibility. I've recently pivoted the role of my systems into some very render-heavy workfloads and GPU support for my container ecosystem is now a must.

The portainerci/portainer:pr4791 image seems to work great for me. I also had the display issues @bobarune mentioned with the dropdowns but other than that, all good. I am using this to spin up Unreal Engine containers that render to specific displays and being able to do that exclusively from Portainer is fun :)

It would be really awesome for this to merge soon so I could have dark mode AND GPU capabilities!

@oramirite
Copy link

oramirite commented Feb 15, 2022

After a little more testing, I found that the following features don't work at all on this image, even on a totally fresh install of Docker + Portainer:

  • entering the App Templates section shows the error "Failure: Unable to retrieve templates". Nothing in the logs.
  • entering the Registries section shows the error: "Failure: Unable to retrieve DockerHub details". Nothing in the logs.

Hopefully I'm not just pointing out something obvious.

@dfldylan
Copy link

dfldylan commented May 2, 2022

Hello, everyone
I found that this PR #4791 was based on version 2.1 and was too old to merge into the main branch version 2.11. And it still has some missing features. I refactored it based on version 2.11.1, and add some extra features.
截屏2022-05-02 11 55 08

@huib-portainer
Copy link
Contributor

You can give it a try by using the image portainerci/portainer:pr6872.
Note that this is a development build and should not be used in a production environment.

This works when using a new Portainer instance on a Docker standalone environment:
image

image

Using all GPUs
image

Results in
image
image
image

To use specific GPUs, configure it in the Environments
image

Shows as
image

And can be selected when deploying the container
image

@huib-portainer huib-portainer added this to the CE-2.15.0 milestone May 19, 2022
@joshua-portainer
Copy link

Closing this issue as this work has been included in our 2.15 release, which will be releasing next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet