Unable to access Glue tables with data.all dataset role due to KMS:decrypt permission not found #1210

TejasRGitHub · 2024-04-22T16:31:18Z

Describe the bug

When importing a dataset, if the KMS key used for the S3 bucket doesn't not have the :root access then the importing dataset fails. Solution for this is to explicitly specify the dataallPivotRole-cdk and give necessary permissions for importing the dataset.

After this the dataset is imported successfully but when trying to query glue tables with Athena using the dataset role, following error is thrown
com.amazonaws.services.s3.model.AmazonS3Exception: User: arn:aws:sts::<AWS_ACCOUNT>:assumed-role/dataall-<dataset_name>-<URI>/<> is not authorized to perform: kms:Decrypt on resource: arn:aws:kms:us-east-1:<KMS_KEY> because no resource-based policy allows the kms:Decrypt action

This is due to the following,
When a dataset is imported , it is registered with dataset role in Lake formation. Whenever querying through lake formation, Lake formation provides temporary S3 credentials through this role. Since this role doesn't have permission to the KMS key the following error is thrown.

How to Reproduce

Create a S3 bucket and add some structured data to it
Create a KMS Key and attach it to the S3 bucket
Remove the :root permission from the KMS key
Create a glue db in the same account for S3 bucket data
Add this section in the KMS key to give the dataPivotRole Access to the KMS key { "Sid": "KMSPivotRolePermissions", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AWS_ACCOUNT_ID:role/dataallPivotRole-cdk" }, "Action": [ "kms:DescribeKey", "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*", "kms:PutKeyPolicy", "kms:GetKeyPolicy", "kms:ReEncrypt*", "kms:TagResource", "kms:UntagResource" ], "Resource": "*" }
Import this S3 bucket as a dataset in data.all
After importing, goto Athena and assume the dataset role
Try to query

Expected behavior

Querying with Athena should work even when dataset is imported with a KMS key which doesn't have the :root access.
In such cases, explicitly add permissions to the KMS keys for the dataset role which is registered with lake formation.

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.9

AWS data.all version

2.3

Additional context

No response

The text was updated successfully, but these errors were encountered:

dlpzx · 2024-04-24T07:21:00Z

Hi @TejasRGitHub I understand the issue and it is not surprising for more strict KMS keys that remove the root permissions. What do you think is the best course of action?
As a first quick measurement, we can update the docs; but, I think that for the long term, we should look for automatic ways of ensuring access.

For example we could add the statement upon import and re-check during shares. Similar to generate_enable_pivot_role_permissions_policy_statement. What do you think?

TejasRGitHub · 2024-04-24T15:00:26Z

Hi @dlpzx , Thanks for taking a look at this issue. As a quick measure we have updated our docs.

For a long term solution, if we add the dataset admin role explicitly to the KMS policy ( like how it is done for created dataset where the KMS policy is created during the stack creation phase and then a SID "EnableDatasetIAMRoleKeyUsage" with permissions is added allowing dataset admin role to access KMS key ) then this should solve the problem.

As this cannot be done via CDK for imported KMS keys during the dataset_stack creation phase ( https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_kms/README.html#:~:text=alias/foo%22)-,Note,-that%20a%20call )

Another way would be to do the update by assuming the pivot Role and then adding a statement in KMS policy for dataset admin role to have access. This could be done during the post_deployment() step mentioned in the cdk_cli_wrapper.py file. Thus the dataset module will have a cli_wrapper.post_deployment implementation which will perform this after the dataset_stack creation is completed

Please let me knw your thoughts on this one

dlpzx · 2024-05-02T14:36:30Z

Hi @TejasRGitHub I am picking up this issue, that as you pointed out, is trickier than it looks.

We should add an statement that grants permissions to the dataset_role and I imagine other roles environment_team_roles to the imported KMS key policy when the dataset is created. Because we do not control the IaC of the KMS key we need to perform these grants using SDK calls.

If we want to use the dataallPivotRole with SDK calls:
When we create/import a dataset 2 stacks get deployed: dataset and environment. The permission that grants access to this KMS key to the dataallPivotRole is added when the environment stack is updated. So what needs to happen:

user triggers dataset import
dataset stack and environment stacks triggered deployment
in environment stack wait for completion of dataallPivotRole update
update KMS policy with a custom resource/post_execution with dataallPivotRole

From a logical perspective, any action on the dataset KMS key should happen in the deployment of the dataset, not of the environment. But if we add a post_deployment to the dataset stack we cannot ensure that the pivot role is updated. We could have it as part of the environment stack, but it is a bit weird that there is a KMS dataset action after the environment is updated.

I am thinking of avoiding using the dataallpivotRole completely and using some sort of custom resource with the CDK exec role. I need to do some research

Let me think about it, because I am not convinced by the solutions proposed yet

TejasRGitHub · 2024-05-02T16:17:26Z

Hi @dlpzx , thanks a lot for taking a look at this issue. I was of the view that post_deployment would start when the stacks are completely deployed. Thanks for clearing this out.

For now, we have found out an internal solution for mitigating this issue. Although, not needed rightnow we could certainly take a look at this in the future releases when we have a concrete solution. I think it will be helpful if data.all can work with super restrictive policies ( as the one mentioned in this GH issue ).

dlpzx added type: bug Something isn't working type: enhancement Feature enhacement priority: medium effort: medium labels Apr 24, 2024

dlpzx added this to To do in v2.5.0 via automation Apr 29, 2024

dlpzx self-assigned this May 2, 2024

dlpzx removed this from To do in v2.5.0 May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to access Glue tables with data.all dataset role due to KMS:decrypt permission not found #1210

Unable to access Glue tables with data.all dataset role due to KMS:decrypt permission not found #1210

TejasRGitHub commented Apr 22, 2024

dlpzx commented Apr 24, 2024

TejasRGitHub commented Apr 24, 2024

dlpzx commented May 2, 2024 •

edited

TejasRGitHub commented May 2, 2024

Unable to access Glue tables with data.all dataset role due to KMS:decrypt permission not found #1210

Unable to access Glue tables with data.all dataset role due to KMS:decrypt permission not found #1210

Comments

TejasRGitHub commented Apr 22, 2024

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS data.all version

Additional context

dlpzx commented Apr 24, 2024

TejasRGitHub commented Apr 24, 2024

dlpzx commented May 2, 2024 • edited

TejasRGitHub commented May 2, 2024

dlpzx commented May 2, 2024 •

edited