Support Compose file version 2 (for `runtime: nvidia`) #241

thomas-riccardi · 2018-06-21T11:35:00Z

Description

It would be useful to support Compose file version 2. The use-case would be to use runtime: nvidia to access the local GPUs from docker-compose services containers.
(runtime option is not supported on Compose file version 3, see docker/compose#5360 (comment))

Steps to reproduce the issue:

docker-compose.yaml with version: '2.4'
docker-app init foo
docker-app render

Describe the results you received:
Error: failed to load Compose file: unsupported Compose file version: 2.4

Describe the results you expected:
docker-app render works with Compose file version 2.

Output of docker-app version:

Version:      v0.2.0
Git commit:   854872f
Build time:   2018-06-11T15:06:17.093522032+00:00
OS/Arch:      linux/amd64
Experimental: off
Renderers:    none

The text was updated successfully, but these errors were encountered:

thomasjo · 2018-06-28T11:47:20Z

@thomas-riccardi On our systems (researcher machines, and GPU cluster nodes) we've worked around the removal of the runtime setting by changing the default runtime to nvidia

// Snippet from "/etc/docker/daemon.json" on my machine
{
  // ...
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

We use nvidia-docker2 on all systems.

thaJeztah · 2018-06-28T15:58:11Z

I'm a bit hesitant to add support for the V2 compose file format; the V2 format has various features that are targeted at local development, and can be non-portable, thus problematic when deploying an application.

Adding runtime support to the V3 compose-file format could be possible, but we'd have to look carefully at this option, because there's some work to be done from an orchestrator perspective;

"runtimes" (as visible in the daemon.json example above) are a custom option that's set on the daemon; in other words; I can add any runtime in the daemon configuration, and call it nvidia (the name is a custom name). This means that there's no guarantee that the nvidia runtime actually is the nvidia runtime. Could be ok if documented as such, but at least something to take into account.
Orchestrators need access to the list of available runtimes on each daemon in the cluster so that containers will only be scheduled on a daemon that has the specified runtime

Orthogonal, but specific to the use-case here; there's still discussion around "how" to expose GPUs to containers; the nvidia "runtime" is currently a thin wrapper around the default runc runtime. In future it may not be needed to have this wrapper (an make this work out of the box) (should not be a blocker for implementing --runtime for orchestrated services though, as there are other use-cases for this as well (think of gVisor or clear/kata containers)

bhack · 2018-11-09T13:40:10Z

/cc @3XX0 @flx42 The lack of runtime selection is going to create portabity issues when you build images on the same machine with docker compose with GPU cause it requires mandatory default Nvidia runtime.
See NVIDIA/nvidia-docker#856

bhack · 2018-11-09T13:43:28Z

I also referenced this issue at docker/compose#6239

xkortex · 2019-03-02T21:01:14Z

@thaJeztah Would it be possible to have a "runtimes" section like there are "volumes" and "networks" sections? I imagine something like this:

services:
  gpu-analytic:
    image: foobar:latest
    runtime: nvidia

runtimes:
  nvidia: 
    path: /usr/bin/nvidia-container-runtime,
    args:
      - FOO:bar

Modifying /etc/docker/daemon.json on every machine seems decidedly un-portable, in my opinion.

thaJeztah · 2019-03-04T14:50:53Z

Modifying /etc/docker/daemon.json on every machine seems decidedly un-portable, in my opinion.

Taking that approach would assume that every machine in the cluster would be able to have /usr/bin/nvidia-container-runtime installed, and should take the same arguments; that assumption cannot be made, as a cluster may include nodes with, and without the nvidia runtime installed.

Can you elaborate why configuring the engine/daemon as part of the installation process for the nvidia runtime is not a suitable approach?

flixr · 2019-03-09T12:32:41Z

@thaJeztah E.g. what about when you want to run only some of the containers with the nvidia runtime?

xkortex · 2019-03-09T20:58:01Z

Can you elaborate why configuring the engine/daemon as part of the installation process for the nvidia runtime is not a suitable approach?

What if you do not want the nvidia runtime as the default runtime? Unfortunately I do not know enough about the inner workings to know what sort of tradeoffs are involved with setting the default runtime to nvidia if you are not running GPU jobs (just because the machine has GPU and docker, doesn't mean it's only running gpu-job containers). But I do know more software engaged == more things to possibly break, so it seems to me that setting all containers to default to nvidia is not the most optimal. It would be nice to define the runtime parameters in daemon.json but have the default be regular runtime, and specify in dc.yml which runtime to use.

Taking that approach would assume that every machine in the cluster would be able to have /usr/bin/nvidia-container-runtime installed, and should take the same arguments; that assumption cannot be made, as a cluster may include nodes with, and without the nvidia runtime installed.

I see your point about portability, putting the hard path in compose could be limiting. But... that's still no guarantee of portability. E.g. I have containers which can't run on nvidia >400, and 3 machines with 1080's on 3XX drivers and a 2080ti that's on 410.

I think a possible solution might be the generic resource field.

edit: I had my syntax wrong, I think I need to play around with the example file.

...

Okay, this might be a workable alternative to runtime: nvidia in docker-compose (along with appropriate additions to daemon.json).

version: "3.5"
services:
  foo-derp-learning:
    image: "nvidia/cuda:9.0-base"
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 1
            - discrete_resource_spec:
                kind: 'nvidia-version'
                value: 384

Kinda clunky in comparison to a single k:v, but I think this could solve another pain of mine, so, might be worth it.

hholst80 · 2019-03-31T09:03:26Z

If docker-app choose to limit docker-app to version 3.x it is a design choice to drive development towards Swarm. CNAB itself has no such goals.

CNAB is not a platform-specific tool. While it uses containers for encapsulating installation logic, it remains un-opinionated about what cloud environment it runs in. CNAB developers can bundle applications targeting environments spanning IaaS (like OpenStack or Azure), container orchestrators (like Kubernetes or Nomad), container runtimes (like local Docker or ACI), and cloud platform services (like object storage or Database as a Service).

https://github.com/deislabs/cnab-spec/blob/master/100-CNAB.md

lmeyerov · 2019-07-20T20:29:02Z

RE: "What if you do not want the nvidia runtime as the default runtime?", that's what we're hitting for a couple reasons:

Portable install: Users repeatedly blow hours/days not realizing (and understanding) they need to modify their system's default docker runtime
Portable builds: For reasons I don't myself understand, when building/exporting images with runtime=nvidia, drivers get baked into the images that shouldn't, while building under runc does the right thing by keeping that to a late binding by later consumers. Rather than constantly toggling some unix setting, we rather keep in declarative config. (Think dev box, build bots, test bots, ...).

vdemeester added the kind/enhancement label Jun 25, 2018

bhack mentioned this issue Nov 7, 2018

nvidia driver version baked into docker build NVIDIA/nvidia-docker#856

Closed

bhack mentioned this issue Nov 9, 2018

Support runtime in v3.x compose files docker/compose#6239

Closed

xkortex mentioned this issue Mar 11, 2019

Any side effects of default-runtime=nvidia? NVIDIA/nvidia-docker#935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Compose file version 2 (for `runtime: nvidia`) #241

Support Compose file version 2 (for `runtime: nvidia`) #241

thomas-riccardi commented Jun 21, 2018

thomasjo commented Jun 28, 2018

thaJeztah commented Jun 28, 2018

bhack commented Nov 9, 2018

bhack commented Nov 9, 2018

xkortex commented Mar 2, 2019 •

edited

thaJeztah commented Mar 4, 2019

flixr commented Mar 9, 2019

xkortex commented Mar 9, 2019 •

edited

hholst80 commented Mar 31, 2019 •

edited

lmeyerov commented Jul 20, 2019

Support Compose file version 2 (for runtime: nvidia) #241

Support Compose file version 2 (for runtime: nvidia) #241

Comments

thomas-riccardi commented Jun 21, 2018

thomasjo commented Jun 28, 2018

thaJeztah commented Jun 28, 2018

bhack commented Nov 9, 2018

bhack commented Nov 9, 2018

xkortex commented Mar 2, 2019 • edited

thaJeztah commented Mar 4, 2019

flixr commented Mar 9, 2019

xkortex commented Mar 9, 2019 • edited

hholst80 commented Mar 31, 2019 • edited

lmeyerov commented Jul 20, 2019

Support Compose file version 2 (for `runtime: nvidia`) #241

Support Compose file version 2 (for `runtime: nvidia`) #241

xkortex commented Mar 2, 2019 •

edited

xkortex commented Mar 9, 2019 •

edited

hholst80 commented Mar 31, 2019 •

edited