Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For existing AWS VPC & subnets created outside of pulumi, pulumi needs a means to tag any public or private subnets for Kubernetes use, that aren't already used by Workers #64

Closed
metral opened this issue Feb 14, 2019 · 6 comments
Assignees
Labels
kind/bug Some behavior is incorrect or out of spec

Comments

@metral
Copy link
Contributor

metral commented Feb 14, 2019

To use existing AWS VPC subnets created outside of pulumi with EKS, the user must manually tag the desired subnets on AWS with the required Kubernetes k/v pair.

If they are not manually tagged, then Kubernetes will not be able to discover them when needing to create Public or Private Load Balancers on AWS in those subnets - unless those subnets were already in use by running instances of the Workers.

Specifically, the VPC and subnets that the Workers are running in are automatically tagged for us in AWS by the EKS service with the key: kubernetes.io/cluster/${clusterName}, and the value: shared, where ${clusterName} is the name of the new EKS cluster. The cluster name is not known to the user or Pulumi until after the cluster has been created and auto-named.

However, this tagging is not done for any other public or private subnets that the Workers aren't already running in, as they are 1) not occupied by running Workers and 2) subsequently, are not automatically tagged by the EKS service.

The manual tagging on these other subnets is a required work around to enable a couple of use cases, such as when a user wants to:

  1. Create a Public LoadBalancer in public subnets across AZs, when the cluster is configured to have its Workers run in private subnets.
  2. Create a Private LoadBalancer in an other private subnets across AZs, that the Worker instances are not already running in.

See this gist for a repro of use case # 1, where Workers are in private subnets, and a Public LoadBalancer Service never comes up if you don't have public subnets appropriately tagged in AWS for Kubernetes discovery. Once you properly tag a public subnet in the VPC for the repro, only then does the Public LoadBalancer get provisioned for the cluster.

After my attempt above, I tried going down the path of retrieving the existing public subnet object using a couple of ways listed below, to modify its .tags props, but this does not seem possible:

  • Tried retrieving an existing subnet object using aws.ec2.Subnet.get(...)
    and
  • Tried retrieving an existing Vpc object using awsx.Network.fromVpc(...)
  • Tried using awsx.ec2.Subnet(...) constructor, with Vpc object returned from above as a param.

However, none of my attempts allowed me to modify the .tags prop on the existing subnets as needed.

The Vpc/awsx.Network object returned from awsx.Network.fromVpc(...) captures all private and public subnets already, so defining and leveraging this object IMO feels like its part of the right approach to ultimately: retrieve the existing subnet(s) in question, and tag them as needed after the cluster has been created, and its pulumi auto-generated name is known. I'm certainly open to hear other alternatives if I'm misunderstanding the use of the packages, or how to best integrate this work around into the right package(s).

It doesn't seem like this is necessarily an issue with pulumi/pulumi-eks, but more about better understanding how to leverage and/or improve @pulumi/pulumi, @pulumi/pulumi-aws and @pulumi/pulumi-awsx to auto-tag any existing subnets in AWS needed by the user.

@lukehoban
Copy link
Member

Note that hashicorp/terraform-provider-aws#3143 is related to this.

@metral
Copy link
Contributor Author

metral commented Feb 14, 2019

@metral metral added the bug label Feb 14, 2019
@metral metral changed the title For existing AWS VPC & subnets created outside of pulumi, pulumi does not auto-tag any public or private subnets that aren't used by Workers For existing AWS VPC & subnets created outside of pulumi, pulumi needs a means to tag any public or private subnets for Kubernetes use, that aren't already used by Workers Feb 14, 2019
@metral metral self-assigned this Feb 14, 2019
@metral
Copy link
Contributor Author

metral commented Feb 15, 2019

I hit another rendition of this:

  • In us-east-2 w/ 3 AZ's (a, b, c), i have 3 public subnets, and 3 private subnets - 1 of each in each of the 3 AZs.
  • The cluster is set to 2 workers, so it only comes up in 2 private subnets across AZs b and c.
  • I tagged the AZ a public subnet with the required tag thinking this would stand up the public LB successfully
  • The public LB did successfully come up in public subnet of AZ a.
  • However, the public LB did not resolve to any running instances, even though the public LB was set to point to the worker instances in b and c, because I did not also tag the other public subnets in AZ's b and c. Therefore, the provisioned LoadBalancer has no means of actually routing to the private subnets housing the running workers.
  • This resulted in the successful creation of a Public LoadBalancer in AZ a only, but it was effectively null since the running worker instances were in AZs b and c.

The main takeaway here is that:

  • If a user wants to create a {public,private} LoadBalancer that spans all AZs of a region (strongly recommended for HA instance resolution in k8s), there must be a {public,private} subnet in every AZ, where each {public,private} subnet is tagged for Kubernetes discovery.

tl;dr:

  • If I want a fully private cluster, with workers in private subnets and only private LoadBalancers, for HA of the LoadBalancers I should have a private subnet in every AZ of the region, and they must all be tagged. This usecase is mostly taken care of by EKS, since it auto-tags any subnet with a running worker instance in it.
  • If I want a private cluster with public LoadBalancers and private LoadBalancers, for HA of both LB types I should have public and private subnets in each AZ of my region, and they must all be tagged. This usecase requires the public subnets be tagged additionally, as the live worker subnets are auto-tagged by EKS for us.
  • In short, we need a means to tag any existing subnet in AWS (created outside of pulumi) to facilitate the cluster architecture of our choice.

To truly resolve this tagging dilemma for the user, we should:

  • Ask for all public and private subnets upfront they expect to operate in.
  • Ensure / Require there is a subnet spanning each AZ of the region, for any kind of subnet specified (public and/or private).
  • Allow user to tag any of the subnets that EKS isn't already tagging (worker subnets) as these subnets will vary by user cluster setup

@lukehoban
Copy link
Member

So - I'm a little confused about the issue here. It seems two topics are being conflated:

  1. There is subnet tagging done automatically by EKS, and that works as expected.
  2. There is subnet tagging required for Kubernetes Service LoadBalancers to be able to allocate internal and external ELBs, and that tagging is not done automatically be EKS, and is "hard to do".

So I think the only problem being discussed here is (2), which is not strictly about EKS. Is that right?

Assuming so, a few thoughts:

First, in standard "desired state configuration" approach, the typical approach to tagging resources would be to require that the owner of the resources include that tags as part of the definition of that resource. I think strictly speaking - this addresses the issue at hand here.

This is limiting though - it means that even in cases like this where logically the owner of the resource doesn't think about things like Kubernetes LoadBalancers, they are responsible for including the annotations - and feels generally like a layering violation. But that's arguably a layering violation in the design of the Kubernetes Service Subnet tagging approach.

Moreover, making this the responsibility of the owner of the Subnet is the only truly robust and consistent way to drive to a desired state given the underlying APIs and approaches to communicating this information (e.g. looking up tags to drive runtime behaviour). If other layers can change the desired state of some part of the resource (it's tags), then there can be multiple layers thinking they are driving the same "thing" to different desired states, which can (and does) cause complications. In practice, (1) above leads to this sort of thing, where two different systems think they own these, and it causes problems like hashicorp/terraform#6632.

Now, in practice, for Tags, there is sort of a "gentleman's agreement" that this is a bag that anyone can party on and that "hopefully things won't conflict". It's sort of intentionally a thing which other layers can think they get to own some subset of the tags for a resource they aren't responsible for in any other way.

In most cases where this model is encouraged, the cloud provider breaks things into two different resources that can be managed separately. Unfortunately, this is not quite how AWS handles Tags.

We and/or Terraform could paper over this and expose an aws.Tag concept generally, or even just aws.SubnetTag, to enable this mode of use. It would almost certainly lead to problems, but could be practically useful. hashicorp/terraform-provider-aws#3143 tracks this, and is something we could certainly look at supporting.

Alternatively, as a more immediate workaround, a DynamicProvider or some simplified form of that, could be used to do more targeted (and non-conflicting) management of these "extra tags" via direct calls to tagging APIs in some lighter-weight CRUD model.

But I think the real answer is a boring one - the right way to do this (at least right now) is to tell the person who owns the definition of the Subnet that they need to add these Tags.

@metral
Copy link
Contributor Author

metral commented Feb 15, 2019

You bring up some great points, especially about numerous layers thinking they are owning and driving the tags, and this causing complications down the road. You've convinced me here that auto-tagging the subnets can lead to further problems and is not something we should do.

I agree that the simplest, and most likely real answer here is to require that the subnet owner 1) have a subnet in all AZs of the region for LB HA, and 2) that they all be tagged with the cluster name by the owner.

I thought this answer ^ could be further facilitated by Pulumi having a means to do it that is accessible by the user to not have to do this out of band from Pulumi e.g. manually in AWS, bash script etc. but it may not work well with the "desired state" model Pulumi employs.

@metral
Copy link
Contributor Author

metral commented Feb 25, 2019

Closing this issue, as the simplest solution is to call out the subnet tagging requirement before using an existing VPC

@metral metral closed this as completed Feb 25, 2019
@infin8x infin8x added the kind/bug Some behavior is incorrect or out of spec label Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

3 participants