Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add profiling initialization code to training_utils #732

Closed
wants to merge 41 commits into from

Conversation

mkovalski
Copy link
Contributor

Add ability to profile Vertex Training jobs on demand.

  • Merged in training_utils and tests from dev branch, consists of environment variables to use during training
  • Add base web server to run with user's job
  • Add tensorflow profiler plugin to be registered with web server to allow for remote profiling through Vertex TensorBoard

This should be merged after #704 as it contains this PR but adding here for clarity.

Fixes #519

mkovalski and others added 30 commits August 23, 2021 15:10
@mkovalski mkovalski requested a review from a team as a code owner September 29, 2021 19:46
@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the AI Platform API. label Sep 29, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 29, 2021
@mkovalski mkovalski closed this Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add remote tensorflow profiling to training jobs.
2 participants