Skip to content
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.

nvidia-container-runtime-3.0.0-1.x86_64.rpm not compatible #68

Closed
arjanlemmers opened this issue Jul 19, 2019 · 9 comments
Closed

nvidia-container-runtime-3.0.0-1.x86_64.rpm not compatible #68

arjanlemmers opened this issue Jul 19, 2019 · 9 comments

Comments

@arjanlemmers
Copy link

In commmit e6f01ff a new package with name nvidia-container-runtime-3.0.0-1.x86_64.rpm is uploaded.

This package is causing problems: it is not compatible with earlier version of docker.
We could not find any change logs for this commit. Is this an intended release?

It is causing problems when auto updating using yum.
Fulll path example:
https://github.com/NVIDIA/nvidia-container-runtime/blob/gh-pages/amzn1/x86_64/nvidia-container-runtime-3.0.0-1.x86_64.rpm

@arjanlemmers
Copy link
Author

@RenaudWasTaken

@jraby
Copy link

jraby commented Jul 19, 2019

Trying to figure out what's in this new release for .deb packages too.
Is this only a dependency change (for both nvidia-docker2 and nvidia-container-runtime) ?

@RenaudWasTaken
Copy link
Contributor

Taking a look at this now

@RenaudWasTaken
Copy link
Contributor

@arjanlemmers can you detail:

  • which version of docker you are using
  • which version of nvidia-container-runtime you were updating from
  • The error you are seeing

The nvidia-container-runtime package doesn't specify any requirements for docker.

Trying to figure out what's in this new release for .deb packages too.
Is this only a dependency change (for both nvidia-docker2 and nvidia-container-runtime) ?

It's mainly a dependency change on Docker, with this new version we are able to only depend on docker >= 18.06.0
This means that we won't break anymore when docker releases new versions.

@arjanlemmers
Copy link
Author

arjanlemmers commented Jul 22, 2019

@RenaudWasTaken. Below a dump of a full scenario describing the configuration and at the last line the error: find runc path: exec: "runc": executable file not found in $PATH

AWS p2.xlarge instance using custom ami based on AWS Deep Learning Base AMI v17.0 (https://aws.amazon.com/machine-learning/amis/)

=============================================================================
       __|  __|_  )
       _|  (     /   Deep Learning Base AMI (Amazon Linux) Version 17.0
      ___|\___|___|
=============================================================================

Nvidia driver version: 410.104
CUDA versions available: cuda-10.0 cuda-8.0 cuda-9.0 cuda-9.2 
Default CUDA version is 9.0 
Libraries: cuDNN, NCCL, Intel MKL-DNN


[ec2-user@ip-10-0-4-210 ~]$ docker run --rm -it --runtime=nvidia nvidia/cuda nvidia-smi
Mon Jul 22 07:09:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    31W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

[ec2-user@ip-10-0-4-210 ~]$ docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a215d7133c34aa18e3b72b4a21fd0c6136
 Built:             Mon Mar  4 21:25:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a/18.06.1-ce
  Built:            Mon Mar  4 21:26:49 2019
  OS/Arch:          linux/amd64
  Experimental:     false

[ec2-user@ip-10-0-4-210 ~]$ rpm -q docker nvidia-docker2 nvidia-container-runtime
docker-18.06.1ce-8.28.amzn1.x86_64
nvidia-docker2-2.0.3-9.docker18.06.1ce.amzn1.noarch
nvidia-container-runtime-2.0.0-8.docker18.06.1ce.amzn1.x86_64

[ec2-user@ip-10-0-4-210 ~]$ uname -a
Linux ip-10-0-4-210 4.14.104-78.84.amzn1.x86_64 #1 SMP Mon Mar 4 19:19:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[ec2-user@ip-10-0-4-210 ~]$ sudo yum update
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
amzn-main                                                                                                                                                                            | 2.1 kB  00:00:00     
amzn-updates                                                                                                                                                                         | 2.5 kB  00:00:00     
Resolving Dependencies
--> Running transaction check
---> Package nvidia-container-runtime.x86_64 0:2.0.0-8.docker18.06.1ce.amzn1 will be obsoleted
---> Package nvidia-container-runtime.x86_64 0:3.0.0-1 will be obsoleting
---> Package nvidia-docker2.noarch 0:2.0.3-9.docker18.06.1ce.amzn1 will be updated
---> Package nvidia-docker2.noarch 0:2.1.0-1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================================================
 Package                                                   Arch                                    Version                                  Repository                                                 Size
============================================================================================================================================================================================================
Installing:
 nvidia-container-runtime                                  x86_64                                  3.0.0-1                                  nvidia-container-runtime                                  804 k
     replacing  nvidia-container-runtime.x86_64 2.0.0-8.docker18.06.1ce.amzn1
Updating:
 nvidia-docker2                                            noarch                                  2.1.0-1                                  nvidia-docker                                             4.3 k

Transaction Summary
============================================================================================================================================================================================================
Install  1 Package
Upgrade  1 Package

Total download size: 808 k


[ec2-user@ip-10-0-4-210 ~]$ rpm -q docker nvidia-docker2 nvidia-container-runtime
docker-18.06.1ce-8.28.amzn1.x86_64
nvidia-docker2-2.1.0-1.noarch
nvidia-container-runtime-3.0.0-1.x86_64

[ec2-user@ip-10-0-4-210 ~]$ docker run --rm -it --runtime=nvidia nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/98aa1f090eee08e1b0cdc6506e743495878d9f3223604665e3ed30d0a5c9653f/log.json: no such file or directory): nvidia-container-runtime did not terminate sucessfully: 2019/07/22 07:13:25 ERROR: nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
: unknown.

@sanjams2
Copy link

sanjams2 commented Jul 23, 2019

Experiencing the exact same issue. Can also add necessary docker/nvidia-container-runtime information if it aids in the investigation.

For anyone else hitting this issue, a quick workaround could be

# this is needed because 'nvidia-container-runtime-hook' was made 'obsolete' by 'nvidia-container-toolkit'
sudo yum install -y nvidia-container-runtime-hook-1.4.0-1.amzn1.x86_64 --exclude nvidia-container-toolkit

NVIDIA_DOCKER_2_PACKAGE_VERSION=2.0.3
NVIDIA_DOCKER_2_PACKAGE_RELEASE=9.docker18.06.1ce.amzn1
NVIDIA_DOCKER_2_PACKAGE_ARCH= # leave empty for no arch
NVIDIA_DOCKER_2_YUM_PACKAGE_NAME=nvidia-docker2-${NVIDIA_DOCKER_2_PACKAGE_VERSION}-${NVIDIA_DOCKER_2_PACKAGE_RELEASE}${NVIDIA_DOCKER_2_PACKAGE_ARCH:+.$NVIDIA_DOCKER_2_PACKAGE_ARCH}

sudo yum install -y ${NVIDIA_DOCKER_2_YUM_PACKAGE_NAME}

@qhaas
Copy link

qhaas commented Jul 23, 2019

Similar issue in CentOS 7 as @ arjanlemmers (details of my system, docker, nvidia stack, etc. attached) since the last update to nvidia-docker, Attempting to start a container with nvidia-docker results in the following:

docker run --runtime=nvidia --rm -it nvidia/opengl:1.0-glvnd-runtime-ubuntu16.04
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/af425d9814b0843b3eb4d17d6b9fdd47837728148961e9832e09e47cf4cb61e3/log.json: no such file or directory): nvidia-container-runtime did not terminate sucessfully: 2019/07/23 17:28:30 ERROR: nvidia-container-runtime: inject NVIDIA hook: stat /usr/bin/nvidia-container-runtime-hook: no such file or directory
: unknown.

dockerinfo.txt

UPDATE: removing everything from the '@libnvidia-container' and '@nvidia-container-runtime' repos, then installing nvidia-container-toolkit docker-ce did the trick, note that with this new version (and per the nvidia documentation), the arguments to docker have changed:

# from https://github.com/NVIDIA/nvidia-docker
$ docker run --gpus all nvidia/cuda:9.2-runtime-centos7 nvidia-smi
Wed Jul 24 13:58:01 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+

New nvidia software stack:

rpm -qa '*nvidia*' | sort
kmod-nvidia-430.34-1.el7_6.elrepo.x86_64
libnvidia-container1-1.0.2-1.x86_64
libnvidia-container-tools-1.0.2-1.x86_64
nvidia-container-toolkit-1.0.1-2.x86_64
nvidia-x11-drv-430.34-1.el7_6.elrepo.x86_64
nvidia-x11-drv-libs-430.34-1.el7_6.elrepo.x86_64
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch

Back in business, but docker-compose is broken: docker/compose#6691

@RenaudWasTaken
Copy link
Contributor

Hello everyone!

I'm sorry we've caused all of you so much trouble.
We've released new packages yesterday and you should be good to upgrade now.

Thanks for reporting the issue, closing for now!

@xml94
Copy link

xml94 commented Feb 11, 2020

It seems that this question appears again. I use
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:28:22 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants