Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP example missing. #102

Open
mcartagenah opened this issue Aug 21, 2021 · 12 comments
Open

GCP example missing. #102

mcartagenah opened this issue Aug 21, 2021 · 12 comments

Comments

@mcartagenah
Copy link

Hi, I couldn't find a gcp example besides the one in #68 , but I get the following error:

Preparing the deployment template...
  Error:
  ------
  <HttpError 404 when requesting https://compute.googleapis.com/compute/v1/projects/ml-images/global/images/family/common-gce-gpu-image?alt=json returned "The resource 'projects/ml-images/global/images/family/common-gce-gpu-image' was not found". Details: "[{'message': "The resource 'projects/ml-images/global/images/family/common-gce-gpu-image' was not found", 'domain': 'global', 'reason': 'notFound'}]">

What am I doing wrong?

@turian
Copy link

turian commented Aug 24, 2021

@apls777 I have the same issue and it's urgent :(

This is my spotty.yaml, could you show a simple working GCP spotty yaml?

project:
  name: spotty-heareval
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir
        - embeddings

containers:
  - projectDir: /workspace/project
    file: Dockerfile
#    ports:
#      # TensorBoard
#      - containerPort: 6006
#        hostPort: 6006
#      # Jupyter
#      - containerPort: 8888
#        hostPort: 8888
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '20G']

instances:
  - name: spotty-heareval-i1
    provider: gcp
    parameters:
      zone: europe-west4-a
      # A100 TODO: Try others
      machineType: a2-highgpu-1g
#      spotInstance: True
#      ports: [6006, 8888]
      dockerDataRoot: /docker
      volumes:
        - name: workspace
          parameters:
            size: 250
# Not implemented for GCP, all volumes will be retained
#            deletionPolicy: retain
            mountDir: /workspace
        - name: docker
          parameters:
            size: 20
            mountDir: /docker

scripts:
  setup: |
    bash setup.sh
  train: |
    bash train.sh
#  tensorboard: |
#    tensorboard --bind_all --port 6006 --logdir /workspace/project/logs
#  jupyter: |
#    jupyter notebook --allow-root --ip 0.0.0.0 --notebook-dir=/workspace/project

@apls777
Copy link
Collaborator

apls777 commented Aug 24, 2021

@turian I'll have a look at it later today. I think GCP just renamed their GPU images.

@turian
Copy link

turian commented Aug 24, 2021

thank you!

@apls777
Copy link
Collaborator

apls777 commented Aug 24, 2021

@turian @mcartagenah I’ll update the code later, but for now, you can just add this line to the instance parameters:

imageUri: projects/ml-images/global/images/family/common-dl-gpu-debian-10

JFYI: when I was working with GCP, I couldn’t use preemptible (spot) GPU instances as they were immediately shut down after launch. But I had a good experience using on-demand CPU instances with preemptible TPUs. If you still want to give it a try, use the preemptibleInstance: true parameter instead of spotInstance: true.

Also, keep in mind that I tested GCP a lot less than AWS, and it looks like not many people actually using it, so you might find some bugs.

@turian
Copy link

turian commented Aug 25, 2021

@apls777 great! I am happy to file bugs and share things that are successful to help other spotty users. We have a grant from GCP to run leaderboard evaluations for our NeurIPS competition: https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html

I would be very excited to use spotty because it will radically simplify the evaluation workflow.

Regarding preemptible instances, "Note: If you are requesting a Preemptible GPU quota for NVIDIA® V100® GPUs, in the justification for the request, specify that the request is for preemptible GPUs." (https://cloud.google.com/compute/docs/gpus)

A few more questions trying to get GCP GPUs running through spotty
0) How do I figure out what version of CUDA is running? From the full image URL? "projects/ml-images/global/images/c0-deeplearning-common-cu110-v20210818-debian-10"

  1. common-dl-gpu-debian-10 where did you find this documented?? I googled this but can't find it documented. Is there a way to pick Ubuntu? Is there a way to change CUDA version? I would love to look these questions up. [edit: Ah weird, I find it here: https://console.cloud.google.com/compute/images?project=hear2021-evaluation]
  2. Is it imageUri now? The docs (https://spotty.cloud/docs/providers/gcp/instance-parameters.html) say imageUrl.

@apls777
Copy link
Collaborator

apls777 commented Aug 25, 2021

How do I figure out what version of CUDA is running? From the full image URL?

You can find the image in the GCP console by its name and check the CUDA version in the description: https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu110-v20210818-debian-10.
It's using CUDA 11.0, so if you need to know it from the image URL, I guess it's the cu110 part.

common-dl-gpu-debian-10 where did you find this documented? ... [edit: Ah weird, I find it here: https://console.cloud.google.com/compute/images?project=hear2021-evaluation]

Yes, you can find it in the list of available images in the GCP console. But common-dl-gpu-debian-10 is the image family, not the image itself, so you need to look at the "Family" column. If you're not familiar with this concept: you can use a "family" image URL instead of a direct image URL to make sure you're always running the latest version of an image. At the moment, the latest version is c0-deeplearning-common-cu110-v20210818-debian-10.

Is there a way to pick Ubuntu? Is there a way to change CUDA version? I would love to look these questions up.

GCP doesn't support Ubuntu-based images with pre-installed Docker and CUDA, but you always can create your own Ubuntu image with any CUDA version and use it with Spotty using the imageUri parameter.

Is it imageUri now? The docs say imageUrl.

Yeah, I noticed it, it's a typo in the docs. Will fix it later.

@turian
Copy link

turian commented Aug 25, 2021

@mcartagenah can we close this issue?

@turian
Copy link

turian commented Aug 26, 2021

@apls777 just curious if you know how to use CUDA 11.1 with GCP images: https://console.cloud.google.com/compute/images?project=hear2021-evaluation

They all appear to be cu110. But, pytorch 1.9.0 builds are only against 11.1. CUDA 11.0 is supported only through pytorch 1.7.1

@turian
Copy link

turian commented Aug 26, 2021

Related #104

@apls777
Copy link
Collaborator

apls777 commented Aug 26, 2021

Replied in #104

@mcartagenah
Copy link
Author

@mcartagenah can we close this issue?

Yes, now it's working with the imageUri you pointed out.

Thank you :)

@turian
Copy link

turian commented Aug 28, 2021

@apls777 confirming that if you ask google for preemptible GPUs with your quota requests, they work with spotty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants