Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: v1.3.10 , how to use gpu,to run container on gpu? #42146

Closed
tbchj opened this issue Feb 27, 2017 · 3 comments
Closed

k8s: v1.3.10 , how to use gpu,to run container on gpu? #42146

tbchj opened this issue Feb 27, 2017 · 3 comments

Comments

@tbchj
Copy link

tbchj commented Feb 27, 2017

now i have almost one week to work on this question. but i failed.
environment: redhat7.2
k8s:v1.3.10 cuda:v7.5 kernel version:367.44 tensorflow:0.11 gpu:1080
our platform based on tensorflow and k8s。it is for training about ML.
when use cpu, it's ok, but can't work on gpu,i want to know why.
i tested many examples you said,but still failed
my cluster: 1 master 2 node. every node has a gpu card, only master hasn't
first I test like @Hui-Zhi said :

vim  test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-gpu-test
spec:
  containers:
  - name: nvidia-gpu
    image: nginx
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1

yes, i tested, and it works. if i change nvidia-gpu: 1 to 2, failed. pod keeping pending. and describe found : no node can satisfied this .because every node has only one gpu card, i think it works.
but question is coming: how to run on gpu? this example only prove that k8s can get gpu,and know gpu,but how to run on it? how can i use yaml file to create one pod run on gpu resource?

then , i found another way: nvidia-docker
i pull gpu-image: gcr.io/tensorflow/tensorflow:0.11-gpu, and run mnist.py demo according to docker, docker run -it ${image} /bin/bash
but failed. something error like "can't open CUDA libarary libcuda.so, cant find libcuda.so ",
Whether someone has encountered the same problem?
then i found that someone said: gpu need use nvidia-docker
luckly i installed as tensorflow: https://www.tensorflow.org/install/install_linux#gpu_support said,accord to nvidia-docker i found my training run on gpu,and gpu memory almost 7g, almost 70%
i run like this: nvidia-docker run -it ${image} /bin/bash
python mnist.py
yes, it works. but a new question coming: should i use docker to run on cpu,and nvidia-docker on gpu? i just run on gpu only on docker , maybe nvidia-docker, but how to run gpu on k8s.
k8s container used docker but not nvidia-docker,so how can i to do this by the same way ,can you help me ? i want to know how to run gpu on k8s, not just a demo or a test yaml to prove k8s support gpu.
hopfully you can answer me,i'm waiting ....
thanks.

@cmluciano
Copy link

@tbchj #42116 is now merged and should be released with 1.6

@tbchj
Copy link
Author

tbchj commented Mar 3, 2017

@cmluciano yes, thank you,maybe you are right. i just read #42116 totally, it seems has something i need.

@tbchj
Copy link
Author

tbchj commented Mar 3, 2017

i just tested, it did work. the volumn i mount was wrong before . the new yaml i used as below

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /var/lib/nvidia-docker/volumes/nvidia_driver/367.44
  containers:
  - name: tensorflow
    image: tensorflow:0.11.0-gpu
    ports:
    - containerPort: 8000
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-driver
      mountPath: /usr/local/nvidia/
      readOnly: true

i solve my problem, thank you

@tbchj tbchj closed this as completed Mar 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants