Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch10 - can't run model.py - error with hypertune #166

Open
jgammerman opened this issue Feb 14, 2023 · 11 comments
Open

Ch10 - can't run model.py - error with hypertune #166

jgammerman opened this issue Feb 14, 2023 · 11 comments

Comments

@jgammerman
Copy link

Hello,

So on p.338 of the book it says:

image

But when I run this I get the following error:

Traceback (most recent call last):
  File "/home/jgammerman/data-science-on-gcp/10_mlops/model.py", line 331, in <module>
    train_and_evaluate(TRAIN_DATA_PATTERN, EVAL_DATA_PATTERN, TEST_DATA_PATTERN, OUTPUT_MODEL_DIR, OUTPUT_DIR)
  File "/home/jgammerman/data-science-on-gcp/10_mlops/model.py", line 180, in train_and_evaluate
    hpt = hypertune.HyperTune()
AttributeError: module 'hypertune' has no attribute 'HyperTune'

I have pip installed hypertune on my VM so I know it's there:

jgammerman@cloudshell:~/data-science-on-gcp/.......$ pip show hypertune
Name: hypertune
Version: 1.0.3
Summary: A library for performing hyperparameter optimization with Polyaxon.
Home-page: https://github.com/polyaxon/hypertune
Author: Polyaxon, Inc.
Author-email: contact@polyaxon.com
License: Apache 2.0
Location: /home/jgammerman/.local/lib/python3.9/site-packages
@lakshmanok
Copy link
Contributor

lakshmanok commented Feb 14, 2023 via email

@jgammerman
Copy link
Author

Tried that - same error as before

@lakshmanok
Copy link
Contributor

lakshmanok commented Feb 14, 2023 via email

@jgammerman
Copy link
Author

That made it work, thanks Lak.

(By the way for anyone else at this stage, you might need to run export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python to get it to work.)

I then tried running the pipeline train_on_vertexai.py and it spent about 10 mins training before failing due to a memory error:

RuntimeError: Training failed with: code: 8 message: "The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus"

I'm currently running the same pipeline using AutoML and it's been training for 2 hours so far - should it take that long?

@lakshmanok
Copy link
Contributor

lakshmanok commented Feb 15, 2023 via email

@jgammerman
Copy link
Author

  1. Done, submitted a pull request

  2. I do have an NVIDIA T4 GPU attached but it's still failing with the same error:

image

  1. AutoML pipeline ended up completing after 3 hours 40 mins.

@jgammerman
Copy link
Author

I'm getting the same error when trying to run hyperparameter tuning:

google.api_core.exceptions.ResourceExhausted: 429 The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus

@lakshmanok
Copy link
Contributor

lakshmanok commented Feb 16, 2023 via email

@lakshmanok
Copy link
Contributor

lakshmanok commented Feb 16, 2023 via email

@jgammerman
Copy link
Author

When navigating to console.cloud.google.com/quotas it says that I need to upgrade to a paid account:

image

I'm still using a managed service (I tried creating a user-managed one a few days ago but it didn't work, something about not enough GPUs currently being available....I just put it down to bad timing and decided to try again later). I guess that's the root of the problem. Will try again.

@jgammerman
Copy link
Author

So I still can't create a user-managed notebook with a GPU:

image

Have tried US-west, -east and -central. Sometimes I also get this error:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants