Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use external Dataflow Service Account #26

Conversation

ilakhtenkov
Copy link
Contributor

This PR is adding create_service_account variable.
If it set to false than module will not create dataflow service account and will use account email provided in dataflow_worker_service_account variable.
Check updated Readme for more details.

@ilakhtenkov
Copy link
Contributor Author

@rarsan Could you please review the PR?

@ilakhtenkov
Copy link
Contributor Author

ilakhtenkov commented Dec 28, 2022

  • Merge after version v1.0.0 released (tag is set).

@ilakhtenkov
Copy link
Contributor Author

Hi @rarsan, Could you please review merge request?

@rarsan rarsan self-assigned this Jan 17, 2023
Copy link
Member

@rarsan rarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a useful addition.
Suggesting changing variable name and behavior for clarity.

README.md Outdated
**Note about Dataflow permissions**: You must also have permission to impersonate Dataflow worker service account in order to attach that service account to Compute Engine VMs which will execute pipeline operations. In case of default worker service account (i.e. your project's Compute Engine default service account), ensure you have `iam.serviceAccounts.actAs` permission over Compute Engine default service account in your project. For security purposes, this Terraform does not modify access to your existing Compute Engine default service account due to risk of granting broad permissions. On the other hand, if you choose to use a user-managed worker service account (by setting `dataflow_worker_service_account` template parameter), this Terraform will add necessary permission over the new service account. The former approach, i.e. user-managed service account, is recommended in order to use a minimally-scoped service account dedicated for this pipeline, and to have impersonation permission managed for you by this Terraform.
#### Dataflow permissions
There are 3 options to manage Dataflow permissions:
1. Usage of the default `Compute Engine` account. `roles/dataflow.worker` and other required permissions (example: secrets and kms access) should be granted outside of the module. Permissions to resources created as part of that module `log sync` and `PubSub` topics are would be granted to Default Compute Engine service accountby the module. To use this option don't provide `dataflow_worker_service_account` variable and set `create_service_account` to `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In first option (using default Compute Engine SA): should be granted outside of the module is not correct. All necessary permissions are in fact granted inside the module. The only exception is the permission to grant impersonation (iam.serviceAccounts.actAs permission via role roles/iam.serviceAccountUser) over the default Compute Engine SA. More details in "Note" below in README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue. roles/dataflow.worker access to the KMS, secrets and account impersonation now should be done outside of the module. Here is an example permissions.tf#96. Only for option 2 role name not empty string and use_externally_managed_dataflow_sa will create the binding.
As you mentioned in another comment we can just remove option #1 and in case user want to use default Compute Engine SA he can pass its name into as in option #3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify:

  • The only permission this module has stopped at providing is the permission to impersonate the default Compute Engine SA as I mentioned above. This gives broad permission to the user/account running this Terraform deployment. That was deemed too risky and obscure from security point of view.
  • Granting other permissions (dataflow worker, access to HEC token) to default Compute Engine SA has always been the case and we don't want to break that with this change. It's necessary for a simple first-time run. If risk is not acceptable, or for production purposes, then user is recommended to specify a custom service account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right Roy, it works as you described. Default behavior didn't change. I just fixed the description in Readme.md.
But there is an invalid combination of variables:

  • use_externally_managed_dataflow_sa = true
    dataflow_worker_service_account = ""

Which will lead to unpredictable behavior when default GCE SA will be used, but permissions will not set. So, I changed the code in such way that in case of dataflow_worker_service_account = "" it is using default GCE SA and setting all required permissions despite of the other variable (use_externally_managed_dataflow_sa).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per other thread, it would be safer to throw an error, rather than inferring what the user wants. Might lead to unintended consequences: maybe the user doesn't want to use or even modify permissions for GCE SA.

We can resolve this as soon as other conversation is resolved.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from 896ab08 to 6dc41c3 Compare January 17, 2023 22:37
@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from 6dc41c3 to ca77984 Compare January 18, 2023 22:45
@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from ca77984 to 08fd742 Compare January 18, 2023 23:00
Copy link
Contributor Author

@ilakhtenkov ilakhtenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rarsan LGTM!
This make it more clear.

@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from 1511623 to 421a114 Compare January 20, 2023 15:45
main.tf Outdated Show resolved Hide resolved
@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from 421a114 to 15d6b78 Compare January 20, 2023 17:31
@ilakhtenkov ilakhtenkov force-pushed the feature/add-option-to-manage-some-permissions-externally branch from 15d6b78 to 06380e0 Compare January 23, 2023 23:22
@rarsan
Copy link
Member

rarsan commented Jan 25, 2023

I've updated the variable names and descriptions in changelog and variables.tf. The extra details were needed since dataflow_worker_service_account can now take the form of either SA name or SA email address.
Please review the regex condition change as well - I wasn't able to test it out just yet. Please do re-generate the README inputs section as well. I believe that's the last bit remaining.

@ilakhtenkov
Copy link
Contributor Author

I've updated the variable names and descriptions in changelog and variables.tf. The extra details were needed since dataflow_worker_service_account can now take the form of either SA name or SA email address. Please review the regex condition change as well - I wasn't able to test it out just yet. Please do re-generate the README inputs section as well. I believe that's the last bit remaining.

Fixed small bug in regexp and tested with 3 different options.
It works good. Also re-generated README.md file.

@rarsan rarsan merged commit ad42e39 into GoogleCloudPlatform:main Jan 25, 2023
@ilakhtenkov ilakhtenkov deleted the feature/add-option-to-manage-some-permissions-externally branch February 15, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants