Specifying iam_path, causes node access issues in aws-auth config map. #1595

leanrobot · 2021-09-22T00:17:35Z

Description

Setting up an EKS cluster at my company. I noticed that if I specify the iam_path input variable for the eks module, the first apply succeeds, but a subsequent apply will cause the node's IAM role mapping to be removed from the config-map, and replaced with one that does not include the proper IAM path in the ARN.

This causes the node's group health to become degraded in the EKS console.

By removing iam_path from the input parameters, the module behaves as expected for first and all subsequent applies.

Versions

Terraform:

Terraform v1.0.7
on darwin_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.11.3
+ provider registry.terraform.io/hashicorp/aws v3.59.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.3.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.5.0
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1

Module: terraform-aws-modules/eks/aws 17.20.0

Reproduction

Steps to reproduce the behavior:
Workspace: default
Cleared Cache: yes

Wrote module configuration and applied.
Cluster works correctly.
Every following apply causes the aws-auth map_roles to be constantly updated, the entry for the worker node.

Code Snippet to Reproduce

// based on: https://github.com/hashicorp/learn-terraform-provision-eks-cluster
// docs: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
module "eks" {
  version = ">= 17.0.0, < 18.0.0"

  source           = "terraform-aws-modules/eks/aws"
  cluster_name     = "test-cluster"
  cluster_version  = "1.20"
  write_kubeconfig = false

  vpc_id  = aws_vpc.main.id
  subnets = [for zone, subnet in aws_subnet.list : subnet.id]

  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.endpoint_ip_cidrs

  cluster_endpoint_private_access = true
  cluster_endpoint_private_access_cidrs = [ for zone, cidr in local.subnet_data: cidr ]

  # iam_path  = "/${local.cluster_name}/eks/" # adding causes issue
  map_roles = concat(
    var.iam_role_mapping,
    [
      # { # adding this fixes the issue, but is not ideal, as it can only be added after second apply to cluster.
      #   groups = [
      #     "system:bootstrappers",
      #     "system:nodes",
      #   ]
      #   rolearn = data.aws_iam_role.nodes.arn
      #   username = "system:node:{{EC2PrivateDNSName}}"
      # },
      {
        rolearn  = "arn:aws:iam::xxxxxx:role/JC-User" # censored account id
        groups   = [ "system:masters" ]
        username = "JC-User"
      },
    ],
  )

  map_users = var.iam_user_mapping

  node_groups_defaults = {
    root_volume_type = "gp2"
    key_name         = aws_key_pair.main.id
    additional_tags  = {
      "Name" = "${local.resource_prefix}-nodes"
    }
  }

  node_groups = {
    primary = {
      name                          = "${local.resource_prefix}-nodes"
      instance_type                 = var.eks_worker_instance_type
      asg_min_size                  = var.eks_asg_min_size
      asg_max_size                  = var.eks_asg_max_size
      asg_desired_capacity          = var.eks_asg_desired_size

      tags = {
        "Name" = "${local.cluster_name}-nodes"
      }
    }
  }
}

data "aws_iam_role" "nodes" { # used for workaround until I discovered removing IAM path fixed.
  name = module.eks.worker_iam_role_name
}

# EC2.tf =======================================================================
// allow access to worker nodes.
resource "aws_security_group_rule" "private_ingress" {
  type              = "ingress"
  from_port         = -1
  to_port           = -1
  protocol          = "ALL"
  cidr_blocks       = var.endpoint_ip_cidrs
  security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_key_pair" "main" {
  key_name = "${local.cluster_name}"
  public_key = var.node_ssh_public_key
}



# vpc.tf =======================================================================
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = local.resource_prefix
  }
}

resource "aws_subnet" "list" {
  for_each = local.subnet_data

  availability_zone = each.key
  cidr_block        = each.value
  vpc_id            = aws_vpc.main.id

  map_public_ip_on_launch = true

  tags = {
    Name = "${local.resource_prefix}-${each.key}"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${local.resource_prefix}-gw"
  }
}

resource "aws_route_table" "main" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${local.resource_prefix}-rt"
  }
}

resource "aws_route_table_association" "eks" {
  for_each = local.subnet_data

  subnet_id = aws_subnet.list[each.key].id
  route_table_id = aws_route_table.main.id
}

# locals.tf ====================================================================
locals {
  base_name = var.cluster_name_prefix
  cluster_name = "${local.base_name}-cluster"
  resource_prefix = "${local.base_name}-eks"

  subnet_data = {
    for index, zone in var.availability_zones: zone => "10.0.${0 + index}.0/24"
  }
}

# variables.tf =================================================================
# EC2
variable "region" {
  description = "The AWS region to create the cluster in."
  type        = string
}

variable "availability_zones" {
  description = "EC2 availability zones where K8S worker nodes will be launched."
  type        = list(string)
}

# SECURITY/PERMISSIONS
variable "node_ssh_public_key" {
  type = string
  description = "public key for ssh access into K8S nodes."
}

variable "endpoint_ip_cidrs" {
  type = list(string)
  description = "list of allow IP CIDRs for full network access, including EKS API endpoints and SSH for nodes."
}

variable "public_ip_cidrs" {
  type = list(string)
  description = "IP CIDRs allowed to access public endpoints and ports for the cluster."
}

# EKS NODES CONFIG
variable "eks_worker_instance_type" {
  type = string
}

# auto scaling group settings
variable "eks_asg_min_size" {
  type = number
}
variable "eks_asg_max_size" {
  type = number
}
variable "eks_asg_desired_size" {
  type = number
}

# providers.tf =================================================================
provider "aws" {
  profile = var.aws_profile
  region = var.aws_region
  allowed_account_ids = []
}

provider "kubernetes" {
  host = module.internal_ranges_cluster.cluster_endpoint
  cluster_ca_certificate = base64decode(module.internal_ranges_cluster.cluster_ca_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args = [
      "eks",
      "get-token",
      "--cluster-name",
      module.eks.cluster_id,
      "--profile",
      var.aws_profile,
      "--region",
      var.aws_region,
    ]
  }
}

# versions.tf ==================================================================
terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = ">= 3.0, < 4.0"
    }
  }

  # TERRAFORM REQUIRED VERSION, use https://github.com/tfutils/tfenv to manage installations
  required_version = "~> 1.0"
}

Expected behavior

I expected that the correct iam role -> cluster role mapping would be set up to allow the control plane and node group to communicate.

Actual behavior

Nodes would enter a degraded state unless I did either of the following:

Removed the iam_path input to the module
Explicitly specified the iam role -> cluster role mapping in the map_roles input.

Terminal Output Screenshot(s)

The text was updated successfully, but these errors were encountered:

daroga0002 · 2021-09-28T10:21:07Z

looks that this is in

terraform-aws-eks/aws_auth.tf

Lines 47 to 49 in 5ce72fa

    
           # Work around https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/153 
        
           # Strip the leading slash off so that Terraform doesn't think it's a regex 
        
           rolearn  = replace(role["worker_role_arn"], replace(var.iam_path, "/^//", ""), "")

and there is comment about kubernetes-sigs/aws-iam-authenticator#153

I dont know does this is still related or not. We will need to investigate this deeper

antonbabenko · 2021-09-28T10:26:02Z

@daroga0002 Please use labels to reflect the status. Someone else from the community may be able to help if they see it. For e.g., needs triage or help wanted.

joanayma · 2021-10-01T10:04:25Z

@leanrobot can you test if #1524 fixes also your issue? I think it's the same.

leanrobot · 2021-10-17T01:20:54Z

@joanayma Hi Joan, I read through #1524 but was unclear how I should test it for myself to see if it addresses my issue.

github-actions · 2021-11-17T00:39:15Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions · 2021-11-28T00:40:55Z

This issue was automatically closed because of stale in 10 days

github-actions · 2022-11-16T02:29:52Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

daroga0002 added help wanted needs triage labels Sep 28, 2021

daroga0002 mentioned this issue Sep 29, 2021

fix: aws-auth cm deletes managed nodegroups entry #1524

Closed

2 tasks

github-actions bot added the stale label Nov 17, 2021

github-actions bot closed this as completed Nov 28, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying iam_path, causes node access issues in aws-auth config map. #1595

Specifying iam_path, causes node access issues in aws-auth config map. #1595

leanrobot commented Sep 22, 2021

daroga0002 commented Sep 28, 2021

antonbabenko commented Sep 28, 2021

joanayma commented Oct 1, 2021

leanrobot commented Oct 17, 2021

github-actions bot commented Nov 17, 2021

github-actions bot commented Nov 28, 2021

github-actions bot commented Nov 16, 2022

Specifying iam_path, causes node access issues in aws-auth config map. #1595

Specifying iam_path, causes node access issues in aws-auth config map. #1595

Comments

leanrobot commented Sep 22, 2021

Description

Versions

Reproduction

Code Snippet to Reproduce

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

daroga0002 commented Sep 28, 2021

antonbabenko commented Sep 28, 2021

joanayma commented Oct 1, 2021

leanrobot commented Oct 17, 2021

github-actions bot commented Nov 17, 2021

github-actions bot commented Nov 28, 2021

github-actions bot commented Nov 16, 2022