CDI: Add `mount-nvidia-binaries` and `mount-nvidia-docker-1-directories` options #290979

ereslibre · 2024-02-23T21:10:09Z

Add three options to hardware.nvidia-container-toolkit:

mounts: list of mounts that allow to mount arbitrary paths on the CDI enabled containers.
mount-nvidia-binaries: this option allows users to avoid mounting nvidia binaries on the container.
mount-nvidia-docker-1-directories: this option allows users to avoid mounting /usr/local/nvidia/lib{,64} on containers.

Description of changes

Things done

I have tested that podman run --device=nvidia.com/gpu=all -d -p 11434:11434 --name ollama ollama/ollama works as expected with this change, and is able to use all the GPU's in my system without any change.

My NixOS configuration sets virtualisation.containers.cdi.dynamic.nvidia.enable = true;.

Add a 👍 reaction to pull requests you find important.

nixos/modules/services/hardware/nvidia-container-toolkit-cdi-generator/cdi-generate.nix

ereslibre · 2024-02-24T09:47:52Z

If upstream produces a broken image

They are not producing a broken image. When you use CUDA in container images it's common practice to do this. With the nvidia-docker, you would:

Set LD_LIBRARY_PATH to /usr/local/nvidia/lib{,64}
Something --a runtime wrapper,likely-- mounts the required drivers and libraries from the host at /usr/local/nvidia/lib{,64}
Your container hopefully runs

Later on, I don't know exactly the timeline or when that happened, they changed the story a little bit:

You generate a CDI spec, that provides you with the ability to extend OCI hooks and mount other directories from the host.
With CDI, you mount the relevant drivers and binaries from the host on the guest.
An OCI hook, injected by CDI, takes the /etc/ld.so.cache on the container, and updates it, by adding the discovered libcuda.so.* and friends.
Your container hopefully runs

It's not ideal, it's probably not the most stable thing, but it's what we have. Many, many, many container images use this as of today. Our interest as NixOS developers is to find the balance between a perfect implementation, and letting people run their stuff without getting in their way as much as possible.

SomeoneSerge · 2024-02-24T10:15:22Z

If they actually enforce that these paths exist, they are broken:) If they were not broken, their LD_LIBRARY_PATH would be a no-op and you'd not get an error because the ld.so would still look up the ld.so.cache

ereslibre · 2024-02-24T10:19:54Z

If they actually enforce that these paths exist, they are broken:) If they were not broken, their LD_LIBRARY_PATH would be a no-op and you'd not get an error because the ld.so would still look up the ld.so.cache

They are not broken, again. They just took the decision that the container images would place the LD_LIBRARY_PATH at the place where the runtime wrapper will mount the host libraries (or ld.so.conf.d being augmented with it) --see official example of creating a cuda image from nvidia:--

https://gitlab.com/nvidia/container-images/cuda/-/blob/e3ff10eab3a1424fe394899df0e0f8ca5a410f0f/dist/12.3.1/ubi9/base/Dockerfile#L40-41

https://gitlab.com/nvidia/container-images/cuda/-/blob/e3ff10eab3a1424fe394899df0e0f8ca5a410f0f/dist/12.3.1/ubi9/base/Dockerfile#L44

Reality is, these images exist in the wild, a lot of them. We cannot fix them, they are already built and published. We can adapt to this reality and make our users life a bit easier, that's all.

SomeoneSerge · 2024-02-24T10:31:42Z

They are not broken, again. They just took the decision that the container images would place the LD_LIBRARY_PATH at the place where the runtime wrapper will mount the host libraries (or ld.so.conf.d being augmented with it) --see official example of creating a cuda image from nvidia:--

The LD_LIBRARY_PATH set up by the container shouldn't matter at all, we have already informed the dynamic loader about the drivers' location through other means

EDIT: I don't disagree we should provide for a smooth experience, I just think we're mistaken about why and how the failure occurs (if it does), and thus about the method of addressing it

ereslibre · 2024-02-24T10:36:42Z

The LD_LIBRARY_PATH set up by the container shouldn't matter at all, we have already informed the dynamic loader about the drivers' location through other means

For the context of this PR, it doesn't matter whether it's LD_LIBRARY_PATH or ld.so.conf.d. Side note: I think the applications depending on CUDA tend to do a dlopen, that is affected by LD_LIBRARY_PATH. I am not so sure that it should not matter at all, I tend to think otherwise.

In any case, I did show both:

How a CUDA container image should be built according to nvidia:
1. Setting LD_LIBRARY_PATH
2. Modifying /etc/ld.so.conf.d
nvidia-docker doing this very same thing

SomeoneSerge · 2024-02-24T10:40:03Z

For the context of this PR, it doesn't matter whether it's LD_LIBRARY_PATH or ld.so.conf.d. Side note: I think the applications depending on CUDA tend to do a dlopen, that is affected by LD_LIBRARY_PATH. I am not so sure that it should not matter at all, I tend to think otherwise.

The dlopen should work out of the box regardless of LD_LIBRARY_PATH, because our CDI hook already takes care of this.

Does your example (ollama) actually fail without these extra mounts?

ereslibre · 2024-02-24T10:40:49Z

The dlopen should work out of the box regardless of LD_LIBRARY_PATH, because our CDI hook already takes care of this.

No, if the code in the container does a dlopen it's already on its own. Only LD_LIBRARY_PATH will matter.

SomeoneSerge · 2024-02-24T10:42:20Z

 No, if the code in the container does a dlopen it's already on its own. Only LD_LIBRARY_PATH will matter.

No, dlopen is handled by the dynamic loader, which follows exactly the same logic as DT_NEEDED: it'll see if the library is already loaded (e.g. by LD_PRELOAD), it'll consult LD_LIBRARY_PATH, it'll consult DT_RUNPATH of the calling process, it'll consult the ld.so.caches

ereslibre · 2024-02-24T10:43:55Z

No, dlopen is handled by the dynamic loader, which follows exactly the same logic as DT_NEEDED: it'll see if the library is already loaded (e.g. by LD_PRELOAD), it'll consult LD_LIBRARY_PATH, it'll consult DT_RUNPATH of the calling process, it'll consult the ld.so.caches

Yes, you are right, my bad. In any case, we still need to mount this directory.

SomeoneSerge · 2024-02-24T10:44:18Z

Yes, you are right, my bad. In any case, we still need to mount this directory.

Why, have we seen any failures?

ereslibre · 2024-02-24T10:45:04Z

Why, have we seen any failures?

As per PR description: podman run --device=nvidia.com/gpu=all -d -p 11434:11434 --name ollama ollama/ollama does not work with GPU acceleration without this PR.

SomeoneSerge · 2024-02-24T10:46:00Z

Very good, and sorry I missed this. Could you publish the logs so we can see what the error was?

ereslibre · 2024-02-24T10:50:52Z

Very good, and sorry I missed this. Could you publish the logs so we can see what the error was?

❯ podman run --rm --device=nvidia.com/gpu=all -e OLLAMA_DEBUG="1" -p 11434:11434 --name ollama ollama/ollama serve
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAaCxFGYXlp+0XG39EwoWKRXrCqlhbn4rqYwO7d9OdY8

time=2024-02-24T10:50:18.777Z level=INFO source=images.go:710 msg="total blobs: 0"
time=2024-02-24T10:50:18.777Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-02-24T10:50:18.777Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)"
time=2024-02-24T10:50:18.777Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-02-24T10:50:20.600Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11 cpu cpu_avx cpu_avx2 rocm_v6 rocm_v5]"
time=2024-02-24T10:50:20.601Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-02-24T10:50:20.601Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-24T10:50:20.601Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-24T10:50:20.601Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/nvidia/lib/libnvidia-ml.so* /usr/local/nvidia/lib64/libnvidia-ml.so*]"
time=2024-02-24T10:50:20.601Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-24T10:50:20.601Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-24T10:50:20.601Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /usr/local/nvidia/lib/librocm_smi64.so* /usr/local/nvidia/lib64/librocm_smi64.so*]"
time=2024-02-24T10:50:20.601Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-24T10:50:20.601Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-24T10:50:20.601Z level=DEBUG source=amd.go:32 msg="amd driver not detected /sys/module/amdgpu"
time=2024-02-24T10:50:20.601Z level=INFO source=routes.go:1042 msg="no GPU detected"

Note last line: "no GPU detected".

ereslibre · 2024-02-24T10:57:46Z

So yeah this instance is pretty hardcoded in ollama, what we could fix to make it a better citizen to be honest: https://github.com/ollama/ollama/blob/2a4b128ae3e3a18b10e8701aca2434d401eaa7ba/gpu/gpu.go#L37-L51

But honestly, I think this is a very common practice I fear.

SomeoneSerge · 2024-02-24T10:58:04Z

Aha. I'd need to dig deeper, but looking at https://github.com/ollama/ollama/blob/2a4b128ae3e3a18b10e8701aca2434d401eaa7ba/gpu/gpu.go#L38-L50, it's more than likely they do not use dlopen with library names (as they would if they actually targeted nvidia-docker), but hard-code the absolute paths. So there's a possibility we're not even talking about institutionalizing nvidia's broken designs, but about accommodating a single project that fails to integrate with these nvidia's designs

ereslibre · 2024-02-24T10:59:24Z

@SomeoneSerge we basically saw the same thing at the same time and wrote the same thing hehe.

I would settle it like this: it makes some broken published software work, and the old wrapper was doing it as well; also, we don't lose much in the process.

Addendum: I also fear this is not the only instance, but I don't have more evidence at hand to back that statement.

ereslibre · 2024-02-24T10:59:38Z

I have to say: and thanks for pushing to find the real root cause :)

SomeoneSerge · 2024-02-24T11:01:09Z

Ah, you were faster:)

But honestly, I think this is a very common practice I fear.

What I'm thinking is, and this concerns the /usr/bin and /usr/lib paths from the original PR, we could add options like cdi.dynamic.nvidia.mount-usr-local, mount-usr-bin, etc. I'd set them to false by default

we don't lose much in the process.

Fairly hard to say. This increases pollution/complexity of the environment, i.e. makes errors and successes less deterministic

ereslibre · 2024-04-16T19:21:29Z

Rebased this. I would like to include this for NixOS 24.05 (#303285). I think removing cdi.static and cdi.dynamic attributes that were included in fd464f0 and 8ba61eb is a better balance than doing the strict separation of cdi.static and cdi.dynamic for NixOS.

SomeoneSerge · 2024-04-18T13:25:18Z

I think removing cdi.static and cdi.dynamic attributes that were included in fd464f0 and 8ba61eb is a better balance

👍🏻
I think a typed option for CDI specs is still warranted. Note that any custom specs in etc."cdi/..." that specify "hostPaths" will need to handle exportReferencesGraph anyway.
Needs an mkRenamedOptionModule for from ...cdi.dynamic.nvidia to the new service for unstable users

ereslibre · 2024-04-18T16:09:25Z

I think a typed option for CDI specs is still warranted. Note that any custom specs in etc."cdi/..." that specify "hostPaths" will need to handle exportReferencesGraph anyway.

I think we can implement the typed option for CDI specs. I would suggest to do so as a follow up after this has been merged, and a way for users to specify their own CDI configurations statically that is properly type checked.

Needs an mkRenamedOptionModule for from ...cdi.dynamic.nvidia to the new service for unstable users

Right, thanks for the reminder. Adding it later today.

ereslibre · 2024-04-22T09:20:24Z

Right, thanks for the reminder. Adding it later today.

Just for the sake of notifying; this was pushed that very same day already.

cc/ @SomeoneSerge

SomeoneSerge · 2024-04-22T10:26:57Z

The updated PR removes the previously introduced "generic" options (virtualisation.containers.cdi.{static,dynamic}) and introduces a more specific hardware.nvidia-container-toolkit. This is more modest than the previous interface, will be easier to maintain, and should be compatible with implementing a new generic interface later after the release. I think we should merge this. We should also post a warning in https://discourse.nixos.org/t/breaking-changes-announcement-for-unstable/17574/.

I updated #290609 to reflect this new direction, and I updated the message in the "feature freeze" issue to refer other issues with the current CDI implementation. Notably, we still need to follow up with a fix to the incomplete deployment issue (dependencies of the host drivers and tools not being mounted).

Before this PR is merged I'd like the commit history to be updated. Ideally, the PR be split into two commits: one for deprecation of the previous interface (the file moves, attribute renamings, &c), and one for the new feature (the FHS bin/libs paths)

@ereslibre thanks again for driving this

ereslibre · 2024-04-22T21:43:26Z

@SomeoneSerge: thank you. Splitted in two commits as you suggested.

I will post in https://discourse.nixos.org/t/breaking-changes-announcement-for-unstable/17574/ when it's merged.

Notably, we still need to follow up with a fix to the incomplete deployment issue (dependencies of the host drivers and tools not being mounted).

Absolutely, we have to follow up on this. Thanks for your feedback during this process.

nixos-discourse · 2024-04-23T07:49:01Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/breaking-changes-announcement-for-unstable/17574/47

nixos/modules/services/hardware/nvidia-container-toolkit/cdi-generate.nix

…s.cdi.dynamic.nvidia.enable` Add the NixOS option `hardware.nvidia-container-toolkit-cdi-generator.enable`. This enables the ability to expose GPU's in containers for container runtimes that support the Container Device Interface (CDI) Remove `cdi.static` and `cdi.dynamic.nvidia.enable` attributes.

…ount-nvidia-docker-1-directories` options - `mount-nvidia-binaries`: this option allows users to avoid mounting nvidia binaries on the container. - `mount-nvidia-docker-1-directories`: this option allows users to avoid mounting `/usr/local/nvidia/lib{,64}` on containers.

SomeoneSerge

If you've tested the runtime, I'm happy to merge now (I can test later)

ereslibre · 2024-04-23T12:21:56Z

I just re-run it, but let me do a more thorough check before finally merging, everything should be fine but will comment here.

I will also double check the Docker issue and if it’s what I presume I will open another PR to fix that so it also makes it for 24.05.

ereslibre · 2024-04-23T17:19:49Z

If you've tested the runtime, I'm happy to merge now (I can test later)

Confirmed that it's working fine here in all the cases I tested.

ereslibre requested a review from SomeoneSerge February 23, 2024 21:10

github-actions bot added 6.topic: nixos 8.has: module (update) labels Feb 23, 2024

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 1-10 labels Feb 23, 2024

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 5dbec5a to 3e0dc82 Compare February 24, 2024 07:22

ereslibre changed the title ~~CDI: Add nvidia-docker 1 directories~~ CDI: Add nvidia-docker 1.0 directories Feb 24, 2024

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch 2 times, most recently from e1c958f to e30ae50 Compare February 24, 2024 07:35

ofborg bot added the ofborg-internal-error label Feb 24, 2024

SomeoneSerge reviewed Feb 24, 2024

View reviewed changes

nixos/modules/services/hardware/nvidia-container-toolkit-cdi-generator/cdi-generate.nix Outdated Show resolved Hide resolved

SomeoneSerge mentioned this pull request Feb 24, 2024

[Tracking issue] CDI Support #290609

Open

7 tasks

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch 3 times, most recently from 59209f1 to 9c73be8 Compare March 23, 2024 23:50

ereslibre marked this pull request as ready for review March 31, 2024 19:47

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 9c73be8 to 6b80785 Compare April 16, 2024 19:13

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 6b80785 to 26c83b0 Compare April 16, 2024 19:23

github-actions bot added 8.has: documentation 8.has: changelog labels Apr 16, 2024

ConnorBaker added the 6.topic: cuda label Apr 17, 2024

SomeoneSerge mentioned this pull request Apr 18, 2024

NixOS 24.05 — Feature Freeze & Release Blockers #303286

Open

39 tasks

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 26c83b0 to 7ad83fc Compare April 18, 2024 20:12

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 7ad83fc to cf5fdb7 Compare April 22, 2024 21:40

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from cf5fdb7 to 6edbd3e Compare April 22, 2024 21:55

SomeoneSerge reviewed Apr 23, 2024

View reviewed changes

nixos/modules/services/hardware/nvidia-container-toolkit/cdi-generate.nix Outdated Show resolved Hide resolved

ereslibre added 2 commits April 23, 2024 12:26

ereslibre force-pushed the cdi-add-nvidia-docker-1-directories branch from 6edbd3e to de3ce5f Compare April 23, 2024 10:32

SomeoneSerge approved these changes Apr 23, 2024

View reviewed changes

ereslibre mentioned this pull request Apr 23, 2024

nixos/nvidia-container-toolkit: enable CDI in docker settings #306337

Merged

13 tasks

SomeoneSerge merged commit 7035968 into NixOS:master Apr 23, 2024
21 checks passed

ereslibre deleted the cdi-add-nvidia-docker-1-directories branch April 23, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDI: Add `mount-nvidia-binaries` and `mount-nvidia-docker-1-directories` options #290979

CDI: Add `mount-nvidia-binaries` and `mount-nvidia-docker-1-directories` options #290979

ereslibre commented Feb 23, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Apr 16, 2024 •

edited

SomeoneSerge commented Apr 18, 2024

ereslibre commented Apr 18, 2024

ereslibre commented Apr 22, 2024

SomeoneSerge commented Apr 22, 2024

ereslibre commented Apr 22, 2024

nixos-discourse commented Apr 23, 2024

SomeoneSerge left a comment

ereslibre commented Apr 23, 2024

ereslibre commented Apr 23, 2024

CDI: Add mount-nvidia-binaries and mount-nvidia-docker-1-directories options #290979

CDI: Add mount-nvidia-binaries and mount-nvidia-docker-1-directories options #290979

Conversation

ereslibre commented Feb 23, 2024 • edited

Description of changes

Things done

ereslibre commented Feb 24, 2024 • edited

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 • edited

SomeoneSerge commented Feb 24, 2024 • edited

ereslibre commented Feb 24, 2024 • edited

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 • edited

SomeoneSerge commented Feb 24, 2024 • edited

ereslibre commented Feb 24, 2024 • edited

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024

ereslibre commented Feb 24, 2024 • edited

ereslibre commented Feb 24, 2024

SomeoneSerge commented Feb 24, 2024 • edited

ereslibre commented Apr 16, 2024 • edited

SomeoneSerge commented Apr 18, 2024

ereslibre commented Apr 18, 2024

ereslibre commented Apr 22, 2024

SomeoneSerge commented Apr 22, 2024

ereslibre commented Apr 22, 2024

nixos-discourse commented Apr 23, 2024

SomeoneSerge left a comment

Choose a reason for hiding this comment

ereslibre commented Apr 23, 2024

ereslibre commented Apr 23, 2024

CDI: Add `mount-nvidia-binaries` and `mount-nvidia-docker-1-directories` options #290979

CDI: Add `mount-nvidia-binaries` and `mount-nvidia-docker-1-directories` options #290979

ereslibre commented Feb 23, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

ereslibre commented Feb 24, 2024 •

edited

SomeoneSerge commented Feb 24, 2024 •

edited

ereslibre commented Apr 16, 2024 •

edited