Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to tackle Karpenter when KMS is mandatory by default #3037

Closed
zohairraza opened this issue May 15, 2024 · 21 comments
Closed

No way to tackle Karpenter when KMS is mandatory by default #3037

zohairraza opened this issue May 15, 2024 · 21 comments
Labels

Comments

@zohairraza
Copy link

zohairraza commented May 15, 2024

Description

In my AWS account, KMS encryption is mandatory for EBS volumes. According to the Karpenter documentation (https://karpenter.sh/docs/troubleshooting/#node-terminates-before-ready-on-failed-encrypted-ebs-volume), when encountering issues with encrypted EBS volumes, an additional policy needs to be added to the KMS Key of the EKS cluster. However, I encountered difficulties implementing this additional policy using the kms_key_source_policy_documents feature provided by the Terraform AWS EKS module.

The issue arises from the fact that kms_key_source_policy_documents expects an IAM policy as input, but the policy in the KMS Key includes a Principal, which is not supported in IAM policy definitions. When attempting to create an IAM Policy without a Principal, I received the following error: "MalformedPolicyDocument: Policy document should not specify a principal."

Additionally, since the KMS key is created by the module, any modifications made to it are overwritten by subsequent Terraform runs. I have been unable to find a solution to ignore these changes.

Versions
Module version [Required]: v20.10.0
Terraform version: v1.5.7
Provider version(s): 5.40.0

Reproduction Code [Required]

module "eks" {
  source                        = "terraform-aws-modules/eks/aws"
  version                       = "~> 20.10.0"
  kms_key_enable_default_policy = true
  cluster_name                  = local.eks_cluster_name
  cluster_version               = var.eks_version
  authentication_mode           = "API_AND_CONFIG_MAP"

  cluster_security_group_tags = {
    "karpenter.sh/discovery" = local.eks_cluster_name
  }

  node_security_group_tags = {
    "karpenter.sh/discovery" = local.eks_cluster_name
  }

  cluster_endpoint_public_access = var.eks_params.cluster_endpoint_public_access

  cluster_enabled_log_types = var.eks_params.cluster_enabled_log_types

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      addon_version            = var.eks_addons_version.aws_ebs_csi_driver
      service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
    }
  }

  vpc_id                   = var.vpc_id
  subnet_ids               = var.private_subnets
  control_plane_subnet_ids = var.private_subnets

  cluster_additional_security_group_ids = [aws_security_group.eks_vpc_vpn_access.id]

}

module "karpenter" {
  depends_on = [module.eks]
  source = "terraform-aws-modules/eks/aws//modules/karpenter"
  cluster_name = local.eks_cluster_name
  irsa_oidc_provider_arn          = module.eks.oidc_provider_arn
  irsa_namespace_service_accounts = ["karpenter:karpenter"]
  create_node_iam_role  = true
#  iam_role_arn         = module.eks.eks_managed_node_groups["default"].iam_role_arn
#  irsa_use_name_prefix = false
   version = "~> 20.0"

  # Used to attach additional IAM policies to the Karpenter node IAM role
  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

# If you wish to maintain the current default behavior of v19.x - disabling old
  enable_irsa             = true
  create_iam_role         = true
  create_instance_profile = true

# To avoid any resource re-creation
  iam_role_name          = "KarpenterIRSA-${module.eks.cluster_name}"
  iam_role_description   = "Karpenter IAM role for service account"
  iam_policy_name        = "KarpenterIRSA-${module.eks.cluster_name}"
  iam_policy_description = "Karpenter IAM role for service account"
}

#data "aws_ecrpublic_authorization_token" "token" {
#  provider = aws.virginia
#}

resource "helm_release" "karpenter" {
  depends_on = [module.karpenter]
  timeout = 1200
  namespace        = "karpenter"
  create_namespace = true
  name                = "karpenter"
  repository          = "oci://public.ecr.aws/karpenter"
 # repository_username = data.aws_ecrpublic_authorization_token.token.user_name
 # repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart               = "karpenter"
  version             = var.helm_release_versions.karpenter

 # lifecycle {
 #   ignore_changes = [
 #     repository_password,
 #   ]
 # }

  set {
    name  = "logLevel"
    value = "debug"
  }

  set {
    name  = "settings.clusterName"
    value = local.eks_cluster_name
  }
  set {
    name  = "settings.clusterEndpoint"
    value = module.eks.cluster_endpoint
  }
  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.karpenter.iam_role_arn
  }
  set {
    name = "serviceAccount.annotations.eks\\.amazonaws\\.com/sts-regional-endpoints"
    value = "true"
    type = "string"
  }
  set {
    name  = "settings.aws.defaultInstanceProfile"
    value = module.karpenter.instance_profile_name
  }
  set {
    name  = "settings.aws.interruptionQueueName"
    value = module.karpenter.queue_name
  }

}


resource "kubectl_manifest" "karpenter_node_class" {
  yaml_body = <<-YAML
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiFamily: AL2
      role: ${module.karpenter.node_iam_role_name}
      subnetSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      securityGroupSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      tags:
        karpenter.sh/discovery: ${module.eks.cluster_name}
      blockDeviceMappings:
        - deviceName: /dev/xvda
          ebs:
            volumeSize: 30Gi
            volumeType: gp3
            iops: 10000
            encrypted: true
            kmsKeyID: ${module.eks.kms_key_id}
            deleteOnTermination: true
            throughput: 125
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

resource "kubectl_manifest" "karpenter_node_pool" {
  yaml_body = <<-YAML
    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        spec:
          nodeClassRef:
            name: default
          requirements:
            - key: "karpenter.k8s.aws/instance-category"
              operator: In
              values: ["t","c", "m", "r"]
            - key: "karpenter.k8s.aws/instance-cpu"
              operator: In
              values: ["2","4", "8", "16", "32"]
            - key: "karpenter.k8s.aws/instance-hypervisor"
              operator: In
              values: ["nitro"]
            - key: "karpenter.k8s.aws/instance-generation"
              operator: Gt
              values: ["2"]
      limits:
        cpu: 10000
      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 120s
  YAML

  depends_on = [
    kubectl_manifest.karpenter_node_class
  ]
}

###Steps to Reproduce the Behavior
Enable KMS encryption by default for EBS volumes.
Use the Terraform AWS EKS module to manage the EKS cluster.
Attempt to add an additional policy to the KMS Key using kms_key_source_policy_documents.
Encounter difficulties due to the inclusion of a Principal in the KMS Key policy.

###Expected Behavior
I expected to be able to seamlessly add an additional policy to the KMS Key of the EKS cluster, as recommended in the Karpenter documentation, without encountering errors related to IAM policy definitions.

###Actual Behavior
I encountered difficulties when attempting to create an IAM policy without a Principal, as the policy in the KMS Key includes a Principal. This resulted in a "MalformedPolicyDocument" error.

###Additional Context

I have already cleared the local cache and ensured that I am not using workspaces. Any assistance in resolving this issue would be greatly appreciated.

@bryantbiggs
Copy link
Member

can you format you code by surrounding with code fences (```hcl` ) and provide psuedo code of what you are trying to do - its not very clear what you are trying to do or how you are approaching it

@zohairraza
Copy link
Author

Hi Bryan, thanks for your response. I added Karpenter code too. Does it explain now?

@bryantbiggs
Copy link
Member

not really - theres a bunch of variables that are unknown.

If you are trying to re-use the cluster KMS key (that is used to encrypt cluster secrets), you will need to add the necessary permissions to use with EBS volumes - the module does not do this by default

@zohairraza
Copy link
Author

yes, i am trying to re-use the cluster KMS key and I need to know which is the good way to tackle this. Since when I add permissions in the key outside of the module, they get overritten at the next terraform run.

I think creating a separate key for karpenter might be a better choice. Let me try that out. Meanwhile you can comment if that's the best way

@bryantbiggs
Copy link
Member

why not just add the required permissions into the key that is created via terraform? check the variables that are provided

@zohairraza
Copy link
Author

that indeed worked !

thanks.. maybe add this to docmentation somewhere so it will help others facing the same issue later on

@bryantbiggs
Copy link
Member

The variables are in the documentation - any variable definitions are automatically added to our documentation

@bryantbiggs
Copy link
Member

we can't duplicate all of the docs within Karpenter, EKS, MNG, Fargate, etc. We focus on documentation related to the module itself

@k9sstorage
Copy link

@zohairraza How did you got this resolved ? I have having the same issue.

@zohairraza
Copy link
Author

zohairraza commented May 28, 2024

By creating another key for karpenter:

resource "aws_kms_key" "KarpenterKMSKey" {
  description = "Karpenter KMS Key"
  policy = local.merged_policy
  depends_on = [module.eks]
}

resource "aws_kms_alias" "KarpenterKMSKey" {
  name          = "alias/eks-karpenter-key"
  target_key_id = aws_kms_key.KarpenterKMSKey.key_id
}

resource "kubectl_manifest" "karpenter_node_class" {
  yaml_body = <<-YAML
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiFamily: AL2
      role: ${module.karpenter.node_iam_role_name}
      subnetSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      securityGroupSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      tags:
        karpenter.sh/discovery: ${module.eks.cluster_name}
      blockDeviceMappings:
        - deviceName: /dev/xvda
          ebs:
            volumeSize: 30Gi
            volumeType: gp3
            iops: 10000
            encrypted: true
            kmsKeyID: ${aws_kms_key.KarpenterKMSKey.key_id}
            deleteOnTermination: true
            throughput: 125
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

@bryantbiggs
Copy link
Member

just to be clear - there is no KMS key for Karpenter. There are two use case where KMS keys are involved within EKS

  1. To encrypt secrets within the cluster - this module supports creating a custom KMS for this purpose
    key_arn = var.create_kms_key ? module.kms.key_arn : encryption_config.value.provider_key_arn
  2. To encrypt EBS volumes on EC2 instances - this module does not create a key for this, but you can pass in an externally created key. This is what I believe is being referred to as the "Karpenter key"

If you want to re-use the KMS key created by this module that was created for encrypting secrets within the cluster, you MUST update the key policy to ensure it will work for encrypting EBS volumes with the solution that is creating the instances (EKS managed node group, self-managed node group, Karpenter, etc.)

terraform-aws-eks/main.tf

Lines 238 to 243 in f90f15e

key_owners = var.kms_key_owners
key_administrators = coalescelist(var.kms_key_administrators, [data.aws_iam_session_context.current.issuer_arn])
key_users = concat([local.cluster_role], var.kms_key_users)
key_service_users = var.kms_key_service_users
source_policy_documents = var.kms_key_source_policy_documents
override_policy_documents = var.kms_key_override_policy_documents

@k9sstorage
Copy link

k9sstorage commented May 28, 2024

@zohairraza thanks for the response.
@bryantbiggs
In my case i am using this module to create EBS KMS also along with secret one.

kms_key_id = module.ebs_kms_key.key_arn

module "ebs_kms_key" {
source = "terraform-aws-modules/kms/aws"
version = "~> 2.1"
description = "Customer managed key to encrypt EKS managed node group volumes"
# Policy
key_administrators = [
data.aws_caller_identity.current.arn
]
key_service_roles_for_autoscaling = [
# required for the ASG to manage encrypted volumes for nodes
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
# required for the cluster / persistentvolume-controller to create encrypted PVCs
module.eks.cluster_iam_role_arn,
]
# Aliases
aliases = ["eks/${local.name}/ebs"]
tags = local.tags
}

The issue I am facing is, when a karpenter tries to scale the instance, it gets terminated immediately, on further investigation it seems to use the default kms key mentioned in the account, instead of using the one created as part of this module.

Even on launchTemplate also i can see the new EBS KMS is being used.

This is how EC2NodeClass is:

`      blockDeviceMappings:
       - deviceName: /dev/xvda
         ebs:
           volumeSize: 30Gi
           volumeType: gp3
           iops: 10000
           encrypted: true
           **kmsKeyID: ${module.ebs_kms_key.key_id}**
           deleteOnTermination: true
           throughput: 125`

Btw.. i am trying to use Pod-Identity (not sure, if there are any extra setting needed)

Thanks

@bryantbiggs
Copy link
Member

looks like a configuration error on your end - I would check the Karpenter documentation https://karpenter.sh/docs/troubleshooting/#node-terminates-before-ready-on-failed-encrypted-ebs-volume

@k9sstorage
Copy link

k9sstorage commented May 28, 2024

Thanks i did looked into it, but couldn't resolve it, i have added that policy to my KMS, still the same..
looks i am missing something, so thought to ask "experts" here... i am bit puzzled how it's getting the default KMS key, even though it's been clearly mentioned to use the new ebs key created.

@bryantbiggs
Copy link
Member

have you set a default on the account/region? https://docs.aws.amazon.com/cli/latest/reference/ec2/get-ebs-default-kms-key-id.html

@k9sstorage
Copy link

k9sstorage commented May 28, 2024

how can karpenter pick a random/default kms key? Even though it's been clearly mentioned in EC2NodeClass to use the KMS created as part of this module ?
Worker nodes can join the cluster without any issues, it's when Karpenter tries to scale up, i have started seeing the issue, from cloud trail it's clear that, it's picking up the wrong KMS key, so not sure what i am missing here ;(

actually i can see the get-ebs-kms command it's the alies/aws/ebs created as part of this is a default, so back to square one, god .....how is it even getting that kms key ;(..... i am missing a critical point here..

@bryantbiggs
Copy link
Member

@k9sstorage
Copy link

k9sstorage commented May 28, 2024

maybe that's the case... Any idea how I can overwrite it ? make it use the new kms created, also i am using create_instance_profile option with karpenter (not sure if that makes any difference).
i am using this block
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/karpenter/main.tf#L115C1-L129C2
with create_instance_profile = true

@zohairraza
Copy link
Author

zohairraza commented May 28, 2024

that was my situation too, so I created another key dedicated for karpenter and used it in karpenter configuration which worked. Before I was using EKS module cluster key in Karpenter Nodepool

most likely your account is configured with https://docs.aws.amazon.com/ebs/latest/userguide/work-with-ebs-encr.html#encryption-by-default

@k9sstorage
Copy link

The issue with my setup was related to the KMS key used while creating the AMI. Even though I specified the KMS in MG and EC2NodePool, it couldn't re-encrypt because the Karpenter role lacked permission for the KMS key created as part of the AMI.

My initial idea was not to use one KMS key for all the clusters, but this issue has brought me back to square one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants