Enable the proprietary NVIDIA driver #116

tfmoraes · 2019-04-16T01:49:12Z

First, great project!

If I'm using Nvidia proprietary driver, OpenGL softwares (like Blender) don't work inside toolbox container. I tried to install the proprietary driver inside the container, it installs but the OpenGL softwares don't work. Is it necessary to install more things? Or set some env variable?

Thanks!

Findarato · 2019-05-06T14:23:41Z

~~Toolbox is a container, you would have to map your graphics card inside, or do things the way nvidia-docker does.~~

The reply further down #116 (comment) works perfectly.

tfmoraes · 2019-05-06T17:51:07Z

@Findarato You mean add something like --volume /dev/nvidia0:/dev/nvidia0 and other /dev files?

tpopela · 2019-05-21T15:46:22Z

So to have the NVIDIA stuff working inside the Toolbox I had to do this (inspired by https://github.com/thewtex/docker-opengl-nvidia):

You have to patch the Toolbox to bind mount the /dev/nvidia0 and /dev/nvidiactl to the Toolbox and setup the X11 things - see tpopela@40231e8
Download the NVIDIA proprietary drivers on the host:

#!/bin/sh

# Get your current host nvidia driver version, e.g. 340.24
nvidia_version=$(cat /proc/driver/nvidia/version | head -n 1 | awk '{ print $8 }')

# We must use the same driver in the image as on the host
if test ! -f nvidia-driver.run; then
  nvidia_driver_uri=http://us.download.nvidia.com/XFree86/Linux-x86_64/${nvidia_version}/NVIDIA-Linux-x86_64-${nvidia_version}.run
  wget -O ~/nvidia-driver.run $nvidia_driver_uri
fi

Install the drivers while being inside the Toolbox:

#!/bin/sh

sudo dnf install -y glx-utils kmod libglvnd-devel || exit 1
sudo sh ~/nvidia-driver.run -a -N --ui=none --no-kernel-module || exit 1
glxinfo | grep "OpenGL version"

tfmoraes · 2019-05-22T17:07:51Z

@tpopela it worked. Thanks!

tpopela · 2019-05-23T05:36:27Z

I'm glad it worked! But there was a mistake that could lead to malfunctions after the host is restarted - you will need to apply tpopela@3db450a on top of the previous patch.

Things like the proprietary NVIDIA driver need access to devices directly inside the /dev directory (eg., /dev/nvidia0 and /dev/nvidiactl), and since such devices can come and go at runtime they cannot be bind mounted individually. Instead, the entire directory needs to be made available. https://github.com/debarshiray/toolbox/issues/116

debarshiray · 2019-05-23T17:17:23Z

@tpopela We might be able to get away without bind mounting /tmp/.x11-unix. These days the X.org server listens on an abstract UNIX socket and a UNIX socket on the file system. The former doesn't work if you have a network namespace, but the Toolbox container doesn't have one (because podman create --net host), and that's why X applications work. The latter is located at /tmp/.x11-unix and is used by Flatpak containers because those have network namespaces.

References:

tpopela · 2019-05-24T05:36:46Z

Ah ok @debarshiray! Thank you for clarification. I can confirm that not bind mounting the /tmp/X11-unix doesn't change anything and the integration works (tried to run Blender here).

There is maybe a small change after we are bind mounting the whole /dev. Blender now looks for nvcc (CUDA stuff) in PATH and can't find it.

tfmoraes · 2019-05-26T23:22:10Z

With the merge of https://github.com/debarshiray/toolbox/pull/119 this issue may be closed, since Nvidia is working now with proprietary driver. It's just necessary to install nvidia driver once inside the toolbox container. @tpopela's scripts helps with driver installation. @tpopela you have to install CUDA Toolkit. To make it install I've passed the parameters --override and --toolkit. After installing CUDA Toolkit Blender show me option to render using CUDA. But unfortunately CUDA doesn't work with GCC9 :(

tpopela · 2019-05-27T04:29:31Z

Actually I would leave this open (but I will leave it on Rishi) as we were thinking with @debarshiray about leaking the NVIDIA host drivers to the container, so there will be no need to manually install the drivers in the container. We have a working WIP solution for it.

tfmoraes · 2019-05-27T12:21:42Z

That would be great!

debarshiray · 2019-06-06T17:43:52Z

we were thinking with @debarshiray about leaking the NVIDIA host drivers to the
container, so there will be no need to manually install the drivers in the container.

Yes, I agree that this will be the right thing to do. OpenGL drivers have a kernel module and some user-space components (eg., shared libraries) that talk to each other. In NVIDIA's case the interface between these two components isn't stable and hence the user-space bits inside the container must match the kernel module on the host. These two can go out of sync if your host is lagging behind the container or vice versa.

The problem with leaking the files into the container is maintaining a list of those files somewhere because they vary from version to version. This would be vastly simpler if there was a well known nvidia directory somewhere on the host that could be bind mounted because then we wouldn't have to worry about the names and locations of the individual files themselves. Unfortunately that's not the case.

Looking around, I found Flatpak's solution to be a reasonable compromise. In short, it invents and enforces this well known nvidia directory. It expects distributors of the host OS to put all the user-space files in /var/lib/flatpak/extension/org.freedesktop.Platform.GL.host/x86_64/1.4 and that's implemented by modifying the package shipping the NVIDIA driver.

With that done, we'd need to figure out where to place these files inside the container and how to point the container's runtime environment at them.

garyedwards · 2019-06-19T07:59:43Z

Nvidia have their own solution for this nvidia-container-runtime-hook which works very well with podman triggered by an oci prestart hook. I just run into an issues at the moment when using --uidmaps resulting in losing permissions to run ldconfig:

could not start /sbin/ldconfig: mount operation failed: /proc: operation not permitted

It may be better for toolbox to try and integrate with this existing tool rather then maintaining another implementation.

garyedwards · 2019-06-19T09:41:25Z

Issue relating to the uidmap permission problem:

NVIDIA/libnvidia-container#49

andreldmonteiro · 2019-11-28T13:54:29Z

I was trying to run steam in the toolbox bug #343 I didn't patch the toolbox, steam runs and opengl works but vulkan doesn't seem to work, tried vkmark and Rise of Tomb Raider on steam.

Any ideas how to get it to work?

tfmoraes · 2020-08-01T22:15:26Z

I saw that Singularity ccontainer fix this problem without libnvidia-container. They use a list of needed files

Ayush1325 · 2021-06-19T17:18:06Z

So what is the status of using Nvidia GPU drivers in container in 2021?
I can /dev/nvidia0 and /dev/nvidiactl are mounted.
However, I cannot install Nvidia drivers successfully. The install proceeds normally but checking with modinfo -F version nvidia gives Error:
modinfo: ERROR: Module alias nvidia not found..
And Nvidia Container Toolkit is not officially supported in Fedora, so it doesn't seem like a good idea to use with Fedora Silverblue.

loganmc10 · 2022-01-15T03:45:44Z

The latest version of toolbox (0.0.99.3) exposes the host filesystem at /run/host. I believe it should be possible to create a Containerfile something like this:

FROM registry.fedoraproject.org/fedora-toolbox:35

RUN ln -s /run/host/usr/share/vulkan/icd.d/nvidia_icd.json /usr/share/vulkan/icd.d/nvidia_icd.json && \
    ln -s /run/host/usr/lib64/libGLX_nvidia.so.0 /usr/lib64/libGLX_nvidia.so.0

To expose the host userspace driver to the container. I don't have an Nvidia machine to test at the moment, but I assume that would do it? The above example should hopefully work for Vulkan, I'm not exactly sure if some extra file would need to be linked for OpenGL

Ayush1325 · 2022-02-23T07:55:58Z

Ok, so with the latest toolbox, I can install nvidia drivers fine. On running nvidia-smi I gett the correct output as well. However, modinfo -F version nvidia command doesn't seem to work so not sure if the drivers are actually working.

whs-dot-hk · 2022-04-19T12:56:06Z

So do you mean reinstall the nvidia driver inside the container is to fix the ldconfig? I remember there is a step to rerun ldconfig

Reference: https://docs.01.org/clearlinux/latest/zh_CN/tutorials/nvidia.html#configure-alternative-software-paths

Findarato · 2022-06-23T19:04:13Z

So to have the NVIDIA stuff working inside the Toolbox I had to do this (inspired by https://github.com/thewtex/docker-opengl-nvidia):

1. You have to patch the Toolbox to bind mount the /dev/nvidia0 and /dev/nvidiactl to the Toolbox and setup the X11 things - see [tpopela@40231e8](https://github.com/tpopela/toolbox/commit/40231e8591d70065199c0df9b6811c2f9e9d7269)

2. Download the NVIDIA proprietary drivers on the host:

#!/bin/sh

# Get your current host nvidia driver version, e.g. 340.24
nvidia_version=$(cat /proc/driver/nvidia/version | head -n 1 | awk '{ print $8 }')

# We must use the same driver in the image as on the host
if test ! -f nvidia-driver.run; then
  nvidia_driver_uri=http://us.download.nvidia.com/XFree86/Linux-x86_64/${nvidia_version}/NVIDIA-Linux-x86_64-${nvidia_version}.run
  wget -O ~/nvidia-driver.run $nvidia_driver_uri
fi

3. Install the drivers while being inside the Toolbox:

#!/bin/sh

sudo dnf install -y glx-utils kmod libglvnd-devel || exit 1
sudo sh ~/nvidia-driver.run -a -N --ui=none --no-kernel-module || exit 1
glxinfo | grep "OpenGL version"

Just adding this worked for me too. I hope with the OSS version of their driver it will just work out of the box like all the AMD cards do.

3dsf · 2022-11-27T08:05:22Z

Ok, so with the latest toolbox, I can install nvidia drivers fine. On running nvidia-smi I gett the correct output as well. However, modinfo -F version nvidia command doesn't seem to work so not sure if the drivers are actually working.

@Ayush1325
yes, the drivers are working as I can compile with nvcc.
yes, modinfo -F version nvidia does not work within the container.

I used the nvidia fedora 35 repo (nvidia-driver and cuda) for both the host (F37) and container (F35; matching gcc version). Beyond that, I added the nvidia bin folder to the path and set the $LD_LIBRARY_PATH for each install.

mjlbach · 2023-03-25T20:23:22Z

What needs to be done for this?

If you don't care about having to have users install nvidia-container-toolkit:

 podman run --rm -it --privileged --security-opt=label=disable -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu

If you want something entirely independent you can mount the relevant nvidia driver files into the container in a manner similar to:

distrobox: https://github.com/89luca89/distrobox/pull/658/files
singularity: https://github.com/apptainer/singularity/blob/master/etc/nvliblist.conf

I don't think installing the nvidia driver inside the container is a sustainable solution because host/container should match.

I personally feel option 1 is more sustainable, but it's pretty simple (two appended environmental variables and a host executable check for nvidia-container-toolkit, would a PR for one of these options be accepted @debarshiray or should this be documented?

debarshiray · 2024-03-28T22:24:22Z

What needs to be done for this?

[...]

would a PR for one of these options be accepted @debarshiray or should this be documented?

Did you see my comment above? Unless there's a problem with it, I still prefer the unmanaged Flatpak extension option.

I finally got myself some NVIDIA hardware to play with this.

I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.

As far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and even though they are supposed to be free software, I see them neither in Fedora proper nor RPMFusion, but RPMFusion does have NVIDIA's proprietary driver.

Is there anything else other than NVIDIA that uses the Container Device Interface?

I would like to understand the situation a bit better. Ultimately I want to make it as smooth as possible for the user to enable the NVIDIA proprietary driver. That becomes a problem if one needs to enable multiple different unofficial repositories, at least on Fedora.

I will start by reviving the pull request from @TingPing against negativo17's RPM for the proprietary NVIDIA driver, but against RPMFusion, because that's the implementation Fedora Workstation promotes these days. If nothing else, it will immediately help Flatpak because those containers will always have access to the driver. We can add the same plumbing to Toolbx and benefit similarly.

debarshiray mentioned this issue May 23, 2019

Change volume mounts to include all of /dev. #119

Merged

tfmoraes closed this as completed May 26, 2019

debarshiray reopened this Jun 6, 2019

HarryMichal added this to Needs triage in Priority Board Jul 28, 2020

HarryMichal added 1. Bug Something isn't working 5. Help Wanted Extra attention is needed labels Sep 10, 2020

HarryMichal moved this from Needs triage to Low priority in Priority Board Sep 10, 2020

This was referenced Nov 16, 2021

Add flag to start toolbox container with gpu support #810

Closed

No Vulkan when using the proprietary Nvidia driver #343

Closed

alvarlagerlof mentioned this issue Dec 14, 2021

Flickering and painting black with mouse flathub/com.brave.Browser#98

Closed

Jacalz mentioned this issue Mar 20, 2022

Pass host graphics drivers into container #1025

Closed

debarshiray changed the title ~~Nvidia proprietary driver~~ Enable the proprietary NVIDIA driver Sep 10, 2022

Jmennius linked a pull request Nov 18, 2023 that will close this issue

cmd/create: Support passing --device option to podman-create #1407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the proprietary NVIDIA driver #116

Enable the proprietary NVIDIA driver #116

tfmoraes commented Apr 16, 2019

Findarato commented May 6, 2019 •

edited

tfmoraes commented May 6, 2019

tpopela commented May 21, 2019

tfmoraes commented May 22, 2019

tpopela commented May 23, 2019

debarshiray commented May 23, 2019 •

edited

tpopela commented May 24, 2019

tfmoraes commented May 26, 2019

tpopela commented May 27, 2019

tfmoraes commented May 27, 2019

debarshiray commented Jun 6, 2019 •

edited

garyedwards commented Jun 19, 2019

garyedwards commented Jun 19, 2019

andreldmonteiro commented Nov 28, 2019 •

edited

tfmoraes commented Aug 1, 2020

Ayush1325 commented Jun 19, 2021

loganmc10 commented Jan 15, 2022

Ayush1325 commented Feb 23, 2022

whs-dot-hk commented Apr 19, 2022

Findarato commented Jun 23, 2022

3dsf commented Nov 27, 2022

mjlbach commented Mar 25, 2023

debarshiray commented Mar 28, 2024

Enable the proprietary NVIDIA driver #116

Enable the proprietary NVIDIA driver #116

Comments

tfmoraes commented Apr 16, 2019

Findarato commented May 6, 2019 • edited

tfmoraes commented May 6, 2019

tpopela commented May 21, 2019

tfmoraes commented May 22, 2019

tpopela commented May 23, 2019

debarshiray commented May 23, 2019 • edited

tpopela commented May 24, 2019

tfmoraes commented May 26, 2019

tpopela commented May 27, 2019

tfmoraes commented May 27, 2019

debarshiray commented Jun 6, 2019 • edited

garyedwards commented Jun 19, 2019

garyedwards commented Jun 19, 2019

andreldmonteiro commented Nov 28, 2019 • edited

tfmoraes commented Aug 1, 2020

Ayush1325 commented Jun 19, 2021

loganmc10 commented Jan 15, 2022

Ayush1325 commented Feb 23, 2022

whs-dot-hk commented Apr 19, 2022

Findarato commented Jun 23, 2022

3dsf commented Nov 27, 2022

mjlbach commented Mar 25, 2023

debarshiray commented Mar 28, 2024

Findarato commented May 6, 2019 •

edited

debarshiray commented May 23, 2019 •

edited

debarshiray commented Jun 6, 2019 •

edited

andreldmonteiro commented Nov 28, 2019 •

edited