Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

docs: update cluster sample #218

Merged
merged 19 commits into from Aug 12, 2021
Merged

docs: update cluster sample #218

merged 19 commits into from Aug 12, 2021

Conversation

loferris
Copy link
Contributor

@loferris loferris commented Jul 2, 2021

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • [x ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • [x ] Code coverage does not decrease (if any source code was changed)
  • [] Appropriate docs were updated: New doc needs to be generated

Fixes #111 🦕

@loferris loferris requested a review from a team as a code owner July 2, 2021 18:38
@loferris loferris requested review from busunkim96 and removed request for a team July 2, 2021 18:38
@snippet-bot
Copy link

snippet-bot bot commented Jul 2, 2021

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added api: dataproc Issues related to the googleapis/python-dataproc API. samples Issues that are directly related to samples. labels Jul 2, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 2, 2021
Copy link
Contributor

@bradmiro bradmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for doing this! Added some feedback.


def test_update_cluster(capsys):
# Wrapper function for client library function
create_cluster.create_cluster(PROJECT_ID, REGION, CLUSTER_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend implementing the logic for creating the cluster as a part of the setup, similar to here.

At least for Python, I typically tried not to make samples co-dependent on each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done, just waiting on unblocking get_cluster to commit

client = dataproc.ClusterControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
# Get cluster you wish to update.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a stylistic thing but I would recommend a line break before each new comment.

cluster = client.get_cluster(
project_id=project_id, region=region, cluster_name=cluster_name
)
# Update number of clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line break, see above.

)
# Update number of clusters
mask = {"paths": {"config.worker_config.num_instances": str(new_num_instances)}}
# Update cluster config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line break, see above.

mask = {"paths": {"config.worker_config.num_instances": str(new_num_instances)}}
# Update cluster config
cluster.config.worker_config.num_instances = new_num_instances
# Update cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line break, see above.

updated_cluster = operation.result()
print(f"Cluster was updated successfully: {updated_cluster.cluster_name}")


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you're missing the END region tag here.

@@ -0,0 +1,56 @@
# Copyright 2019 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to 2021.

update_cluster.update_cluster(PROJECT_ID, REGION, CLUSTER_NAME, NEW_NUM_INSTANCES)

out, _ = capsys.readouterr()
assert CLUSTER_NAME in out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test, I would instead suggest a get_cluster call and confirm that the number of instances matches NEW_NUM_INSTANCES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been calling dataproc.ClusterControllerClient.get_cluster(PROJECT_ID).config to try and get the worker_config, but it doesn't let me get any deeper into the cluster object than config. I'm not sure how to get in there currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the following ended up working for me when pointed to a cluster that I have with two workers. Did you also provide the region / cluster_name to get_cluster?

from google.cloud import dataproc

PROJECT_ID = "my-project"
REGION = "my-region"
CLUSTER_NAME = "my-cluster"

client = dataproc.ClusterControllerClient(
                  client_options={
                      "api_endpoint": "{}-dataproc.googleapis.com:443".format(REGION)
                   }
               )

cluster = client.get_cluster(project_id=PROJECT_ID, region=REGION, cluster_name=CLUSTER_NAME)

print(cluster.config.worker_config.num_instances) # should print an integer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradmiro I haven't been able to get it working in the context of a test. My recent commit has three options I've tried: creating a fixture, creating a static cluster, and creating the cluster in the test. I still can't get in deeper than config. (This may be a Pythonic learning curve thing.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually does seem to be working in the current commit in CI!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final commit ready for review!

@@ -0,0 +1,67 @@
#!/usr/bin/env python
# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check the license is correct here. You can take it from here and replace the year with 2021.

samples/snippets/submit_job.py Show resolved Hide resolved
@@ -0,0 +1,56 @@
# Copyright 2019 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2019 Google LLC
# Copyright 2021 Google LLC

samples/snippets/submit_job.py Show resolved Hide resolved
samples/snippets/update_cluster.py Show resolved Hide resolved
samples/snippets/update_cluster.py Outdated Show resolved Hide resolved
Lo Ferris and others added 4 commits July 23, 2021 15:43
Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>
Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>
@loferris
Copy link
Contributor Author

@busunkim96 @bradmiro
So sorry for the delay - I didn't realize I wasn't getting my GH notifications! I've committed some of the formatting changes and have a commit ready to go with the rest of the changes as soon as I can get clarification on the get_cluster method from Brad!

@bradmiro
Copy link
Contributor

@busunkim96 @bradmiro
So sorry for the delay - I didn't realize I wasn't getting my GH notifications! I've committed some of the formatting changes and have a commit ready to go with the rest of the changes as soon as I can get clarification on the get_cluster method from Brad!

Added a comment!

@parthea parthea changed the title Update cluster sample docs: update cluster sample Jul 26, 2021
@loferris
Copy link
Contributor Author

requested changes made!

Copy link
Contributor

@bradmiro bradmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I left one comment and it looks like there's an outstanding lint error, but otherwise LGTM. Thanks for doing this!


out, _ = capsys.readouterr()
assert CLUSTER_NAME in out
assert new_num_cluster.config.worker_config.num_instances == 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I would replace 5 with the NEW_NUM_INSTANCES variable

Lo Ferris and others added 2 commits August 2, 2021 14:03
@loferris
Copy link
Contributor Author

loferris commented Aug 3, 2021

ready for review/approval!

@loferris
Copy link
Contributor Author

loferris commented Aug 3, 2021

@busunkim96 ready for final review/approval! :)

Copy link
Contributor

@busunkim96 busunkim96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I left one comment regarding api_endpoint, otherwise everything looks good. :)

client_transport = (
cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
address="{}-dataproc.googleapis.com:443".format(region)
)
)
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_transport)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to block this PR on this, but there's an easier way to set regional endpoints now through google.api_core.client_options. See this speech sample: https://cloud.google.com/speech-to-text/docs/endpoints

@loferris loferris merged commit 80706f9 into master Aug 12, 2021
@loferris loferris deleted the update_cluster_sample branch August 12, 2021 21:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: dataproc Issues related to the googleapis/python-dataproc API. cla: yes This human has signed the Contributor License Agreement. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation for passing the cluster configuration to update_cluster
3 participants