-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to use external Dataflow Service Account #26
Option to use external Dataflow Service Account #26
Conversation
@rarsan Could you please review the PR? |
|
Hi @rarsan, Could you please review merge request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is a useful addition.
Suggesting changing variable name and behavior for clarity.
README.md
Outdated
**Note about Dataflow permissions**: You must also have permission to impersonate Dataflow worker service account in order to attach that service account to Compute Engine VMs which will execute pipeline operations. In case of default worker service account (i.e. your project's Compute Engine default service account), ensure you have `iam.serviceAccounts.actAs` permission over Compute Engine default service account in your project. For security purposes, this Terraform does not modify access to your existing Compute Engine default service account due to risk of granting broad permissions. On the other hand, if you choose to use a user-managed worker service account (by setting `dataflow_worker_service_account` template parameter), this Terraform will add necessary permission over the new service account. The former approach, i.e. user-managed service account, is recommended in order to use a minimally-scoped service account dedicated for this pipeline, and to have impersonation permission managed for you by this Terraform. | ||
#### Dataflow permissions | ||
There are 3 options to manage Dataflow permissions: | ||
1. Usage of the default `Compute Engine` account. `roles/dataflow.worker` and other required permissions (example: secrets and kms access) should be granted outside of the module. Permissions to resources created as part of that module `log sync` and `PubSub` topics are would be granted to Default Compute Engine service accountby the module. To use this option don't provide `dataflow_worker_service_account` variable and set `create_service_account` to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In first option (using default Compute Engine SA): should be granted outside of the module
is not correct. All necessary permissions are in fact granted inside the module. The only exception is the permission to grant impersonation (iam.serviceAccounts.actAs
permission via role roles/iam.serviceAccountUser
) over the default Compute Engine SA. More details in "Note" below in README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue. roles/dataflow.worker
access to the KMS, secrets and account impersonation now should be done outside of the module. Here is an example permissions.tf#96. Only for option 2
role name not empty string and use_externally_managed_dataflow_sa
will create the binding.
As you mentioned in another comment we can just remove option #1
and in case user want to use default Compute Engine SA he can pass its name into as in option #3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify:
- The only permission this module has stopped at providing is the permission to
impersonate
the default Compute Engine SA as I mentioned above. This gives broad permission to the user/account running this Terraform deployment. That was deemed too risky and obscure from security point of view. - Granting other permissions (dataflow worker, access to HEC token) to default Compute Engine SA has always been the case and we don't want to break that with this change. It's necessary for a simple first-time run. If risk is not acceptable, or for production purposes, then user is recommended to specify a custom service account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right Roy, it works as you described. Default behavior didn't change. I just fixed the description in Readme.md
.
But there is an invalid combination of variables:
use_externally_managed_dataflow_sa = true
dataflow_worker_service_account = ""
Which will lead to unpredictable behavior when default GCE SA will be used, but permissions will not set. So, I changed the code in such way that in case of dataflow_worker_service_account = ""
it is using default GCE SA and setting all required permissions despite of the other variable (use_externally_managed_dataflow_sa
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other thread, it would be safer to throw an error, rather than inferring what the user wants. Might lead to unintended consequences: maybe the user doesn't want to use or even modify permissions for GCE SA.
We can resolve this as soon as other conversation is resolved.
896ab08
to
6dc41c3
Compare
6dc41c3
to
ca77984
Compare
ca77984
to
08fd742
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rarsan LGTM!
This make it more clear.
1511623
to
421a114
Compare
421a114
to
15d6b78
Compare
15d6b78
to
06380e0
Compare
I've updated the variable names and descriptions in changelog and variables.tf. The extra details were needed since |
Fixed small bug in regexp and tested with 3 different options. |
This PR is adding
create_service_account
variable.If it set to
false
than module will not create dataflow service account and will use account email provided indataflow_worker_service_account
variable.Check updated Readme for more details.