New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add cloud profiler to training_utils #828
Conversation
… support for local file uploads
@mkovalski: I am assigning as owner of this PR; feel free to ping reviewers as needed to make sure the review process progresses in a timely fashion, or provide guidance on a who might better own the process of getting the PR reviewed, passing continuous testing, and merged. Reach out if you have questions. |
|
||
if not environment_variables.http_handler_port: | ||
raise MissingEnvironmentVariableException( | ||
"'AIP_HTTP_HANDLER_PORT' must be set." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the user set this using env
or is this set by the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is set by the service.
|
||
from google.cloud.aiplatform.training_utils.cloud_profiler.plugins import base_plugin | ||
from typing import List | ||
from werkzeug import wrappers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap with informative importerror exception.
google/cloud/aiplatform/training_utils/cloud_profiler/webserver.py
Outdated
Show resolved
Hide resolved
google/cloud/aiplatform/training_utils/cloud_profiler/webserver.py
Outdated
Show resolved
Hide resolved
setup.py
Outdated
full_extra_require = list( | ||
set(tensorboard_extra_require + metadata_extra_require + xai_extra_require) | ||
set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TF version should be handled explicitly since TB, XAI, and Profiler have different version bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Adds ability to profile vertex training jobs using tensorboard profiler.
Fixes #519