Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add debugging terminal support for CustomJob, HyperparameterTun… #699

Conversation

morgandu
Copy link
Contributor

  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<b/195449603> 🦕

@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the AI Platform API. label Sep 10, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 10, 2021
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch 2 times, most recently from b71b3fb to 84304ed Compare September 30, 2021 18:50
google/cloud/aiplatform/jobs.py Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch from 6518fb3 to 41eca99 Compare October 15, 2021 06:08
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
(Dict[str, str]) - web access uris of the custom job
"""
self._sync_gca_resource()
return self._gca_resource.web_access_uris
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this field need to be cast to a dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works the same as labels, I see we have labels as it is and it is a dict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a bug:

isinstance(ds.labels, dict)
# False
type(ds.labels)
# google.protobuf.pyext._message.ScalarMapContainer

I created an issue to track that here: b/203653647

Preference to not carry that issue over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, noted

self._sync_gca_resource()

if self._gca_resource.trials:
return self._gca_resource.trials[-1].web_access_uris
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can trials execute in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trials can be executed in parallel upon parallel_trial_count is set. Updated for HyperparameterTuningJob to check web_access_uris of trials in parallel.

google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
self._gca_resource.training_task_metadata
and self._gca_resource.training_task_metadata.get("backingCustomJob")
and self._gca_resource.training_task_inputs.get("enable_web_access")
and not self._has_logged_web_access_uris
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the web_access_uris have changed throughout the run? If, for example, one of the workers failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current service setup is that if workers failed and restarted, the web_access_uris will redirect to the new workers, but won't change itself.

google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch 3 times, most recently from 58c656d to 0fbb148 Compare October 19, 2021 19:02
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Show resolved Hide resolved
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch 2 times, most recently from 81a4ae4 to 5e44f24 Compare October 20, 2021 22:10
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch from 71f5813 to 706423f Compare October 21, 2021 00:11
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch from 706423f to 7214e21 Compare October 21, 2021 22:33
@morgandu morgandu force-pushed the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch from 7214e21 to f8b67ea Compare October 21, 2021 22:51
@morgandu morgandu merged commit 2deb505 into googleapis:main Oct 22, 2021
@morgandu morgandu deleted the mor--debugging-terminal-integration-customjob-hp-customtrainingjob branch October 22, 2021 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants