Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why not subnets are found for creating an ELB on AWS #29298

Closed
sdouche opened this issue Jul 20, 2016 · 46 comments
Closed

Explain why not subnets are found for creating an ELB on AWS #29298

sdouche opened this issue Jul 20, 2016 · 46 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@sdouche
Copy link

sdouche commented Jul 20, 2016

Hi,
I created a Kubernetes cluster from coreos-aws (with existing VPC and subnets). I can't create an ELB on it.

The file:

apiVersion: v1
kind: Service
metadata:
  name: nginxservice
  labels:
    name: nginxservice
spec:
  ports:
    - port: 80
  selector:
    app: nginx
  type: LoadBalancer

The command:

kubectl kubeconfig=kubeconfig describe svc nginxservice
Name:           nginxservice
Namespace:      default
Labels:         name=nginxservice
Selector:       app=nginx
Type:           LoadBalancer
IP:         172.18.200.63
Port:           <unnamed>   80/TCP
NodePort:       <unnamed>   31870/TCP
Endpoints:      172.18.29.3:80,172.18.31.3:80
Session Affinity:   None
Events:
  FirstSeen             LastSeen            Count   From            SubobjectPath   Reason          Message
  Wed, 20 Jul 2016 18:41:15 +0200   Wed, 20 Jul 2016 18:41:20 +0200 2   {service-controller }           CreatingLoadBalancer    Creating load balancer
  Wed, 20 Jul 2016 18:41:15 +0200   Wed, 20 Jul 2016 18:41:20 +0200 2   {service-controller }           CreatingLoadBalancerFailed  Error creating load balancer (will retry): Failed to create load balancer for service default/nginxservice: could not find any suitable subnets for creating the ELB

I added manually the missing tag KubernetesCluster on the subnet w/o result. Can you add an clear message about what is missing?

@apelisse
Copy link
Member

cc @pwittrock Not sure if this is support or an actual issue?

@colhom
Copy link

colhom commented Jul 20, 2016

@sdouche are you perhaps running out of free IP addresses in your subnet? Each ELB will also need to a separate network interface on (each) target subnet, and I believe the rule is either 5 or 8 free addresses in the subnet for ELB creation to be allowed.

@cgag ran into this recently during some operational work here at coreos and told me about it.

@sdouche
Copy link
Author

sdouche commented Jul 20, 2016

Hi @colhom,
I've 240 free IPs on the subnet. The issue is to find a network, not a free IP.

@colhom
Copy link

colhom commented Jul 20, 2016

@sdouche could I see your diff that allows you to deploy to an existing subnet? I've been curious to see how folks are doing this- we use route tables and vpc peering heavily, so in our case we have no need to deploy to the same subnet.

@colhom
Copy link

colhom commented Jul 20, 2016

Or are you just modifying the stack-template.json after render but prior to up?

@sdouche
Copy link
Author

sdouche commented Jul 20, 2016

Just modified the stack-template.json and removed the creation of network items (more details here: coreos/coreos-kubernetes#340)

@qqshfox
Copy link

qqshfox commented Jul 21, 2016

@sdouche Are those subnets private? A public ELB can't be created in private subnets. K8s will get all subnets tagged with correct KubernetesCluster, then ignore private subnets for public ELB.

You can try to tag a public subnet with correct KubernetesCluster, then wait k8s to retry to create the ELB in that subnet.

@sdouche
Copy link
Author

sdouche commented Jul 21, 2016

@qqshfox good point, it's a private subnet. Why private subnets are ignored? How to create a private cluster?

@qqshfox
Copy link

qqshfox commented Jul 21, 2016

You can create an internal ELB by using some magic k8s metadata tag.

@sdouche
Copy link
Author

sdouche commented Jul 21, 2016

"some magic k8s metadata tag"? What are they?

@pieterlange
Copy link

@sdouche You have to tag your subnet with the "KubernetesCluster" tag. I see you used kube-aws before, you can look at that for inspiration on how to to properly create your subnets. Also note that making a loadbalancer in a private subnet doesn't make much sense if you want to expose a service to the world (can't route in)

@sdouche
Copy link
Author

sdouche commented Jul 21, 2016

Hi @pieterlange. Ok so, if I want a private cluster, how to expose services and pods w/o ELB? Do I need to route the 2 overlay networks? How to do that? I suppose with Flannel's aws backend.

@pieterlange
Copy link

I do not understand what you're trying to accomplish so it's a little bit difficult to help. Issues like these (this is starting to look like a support request) are better solved through slack chat or stackoverflow as there's no actionable material for the developers here. I suggest closing the ticket and trying over there.

@sdouche
Copy link
Author

sdouche commented Jul 21, 2016

You're right, sorry. Back to the initial request: I think it would be better to write: "could not find any public subnets for creating the ELB" (for public ELB of course, which is the default option). What do you think?

@pwittrock pwittrock added priority/backlog Higher priority than priority/awaiting-more-evidence. team/cluster and removed team/ux labels Jul 21, 2016
@pwittrock
Copy link
Member

@justinsb WDYT?

@manojlds
Copy link

manojlds commented Oct 7, 2016

How to create a private ELB with private subnets?

@cknowles
Copy link

Some information in #17620 about private ELBs.

@druidsbane
Copy link

druidsbane commented Nov 15, 2016

Has anyone gotten this to work recently? I can get it to create the internal/private ELB but none of the node machines are added to the ELB. If I manually add them everything works fine, so it is set up properly except for adding the ASG for the nodes or adding the nodes themselves.

@justinsb Is there some annotation I need to use possibly to allow it to find the nodes it needs to add to the private ELB? I'm creating the cluster with kubeadm to join the nodes and the AWS cloud provider integration. The subnets, vpcs and autoscaling groups are all tagged with "KubernetesCluster" and a name. That does propagate to the ELB, but none of the node instances are picked up. I don't see anything specific in the code to add the node ASG to the ELB based on annotation...

@cyberroadie
Copy link

I have the same problem. I've got Kubernetes running in a private subnet. To explain it a bit further (this is AWS specific). Our infrastructure team has created specific requirements regarding security. We need to have three layers (subnets) in one VPC zone. Diagram:

type connection components
public subnet internet gateway ELB
private subnet 1 nat gateway kubernetes (master/nodes)
private subnet 2 direct connect proxy for on premise server access

For this to work I had to manually create a ELB in layer 1 (public subnet) and point them to the master nodes in layer 2 (private subnet 1). I also installed the dashboard and this works fine together with the kubectl command line tool. (Both are exposed to the internet)

However when I deploy an app (e.g. nginx) I get the following error:

Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: could not find any suitable subnets for creating the ELB

The Kubernetes dashboard says the service-controller is the source of this. And when I run:

 $ kubectl get services

it outputs:

    NAME         CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
    kubernetes   100.xx.x.1      <none>        443/TCP   3h
    my-nginx     100.xx.xx.99    <pending>     80/TCP    1h

Is there a way to tell the controller which subnet it should use to create the load balancer for the service?

@rokka-n
Copy link

rokka-n commented Jan 13, 2017

  1. Still the same problem with provisioning ELBs for ingress for instances in private subnets.

But no worries, kubernetes is built upon 15 years of experience of running production workloads at Google. Amazon will fix their ELBs sometimes soon.

@cemo
Copy link

cemo commented Jan 17, 2017

@cyberroadie How did you solve your problem? I am in the same situation and no idea to resolve problem.

@cyberroadie
Copy link

cyberroadie commented Jan 17, 2017

Manually creating the routes via the AWS web interface.

@2rs2ts
Copy link
Contributor

2rs2ts commented Jan 9, 2018

@whereisaaron what is an "ownership value" in this case?

@whereisaaron
Copy link

whereisaaron commented Jan 11, 2018

@2rs2ts the ownership value for the kubernetes.io/cluster/<your-cluster-id> tag is I think either "owned" or "shared". But I am not sure the values matters for 'hasClusterTag' to work.

You can read the code @2rs2ts to understand the process of finding a subnet.

  1. The AWS support first finds all the subnets you have associated with your cluster via the tag kubernetes.io/cluster/<your-cluster-id>
    (You'll note the same subnet can be tagged for use by more than one cluster this way). Subnets without this tag are ignored and not considered. If no subnets are tagged only the current subnet is considered.

  2. (a) For external loadbalancers (the default), any subnets that aren't public are excluded (who's routing table doesn't have an Internet Gateway route). Then it will look for the kubernetes.io/role/elb tag on the remaining subnets and pick one of those. Or if no public subnets are tagged, one gets picked at random.

  3. (b) For internal loadbalancers (which you indicate you want using the service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 annotation), it will first go for those with the kubernetes.io/role/internal-elb tag, or else pick one at random.

This process is repeated for each AZ your cluster occupies.

@plombardi89
Copy link

I have a bit of a problem with this design... it's not clear how to use the shared value at all when you're running many clusters in a subnet without forcing each cluster to . I have a system which bootstraps ephemeral single-node Kubernetes clusters in an auto scaling group and allows users to claim them on-demand for 1 - 24 hours (mostly used for experiments and running tests). The number of clusters can quickly be in the hundreds and there is a tag limit on AWS resources of 50.

A couple problems:

  1. More clusters than available tags for shared subnets.
  2. I need to add setup / teardown routines for shared resources into the resource provisioner. Right now that's a cloud-init script that runs at boot for init. For cleanup there is nothing.
  3. I have no way to control which subnets get assigned a kubernetes instance in the pool so the auto scaler might drop one into a subnet with more than 50 other clusters.

@whereisaaron
Copy link

IMHO I think @plombardi89 that kind of a mis-use of an autoscaler 😄 Since 'claimed' nodes are pets not cattle. However, if you want to go this way, then I can suggest you create the autoscaler to create t2.nano instances (somewhere), with a cloudinit script that uses a CloudFormation template to create a tiny subnet, with the one-node cluster and any subnet tags. When the t2.nano gets the scale-down or shutdown request, delete the CloudFormation stack to clean up the cluster and its tiny subnet.

@plombardi89
Copy link

@whereisaaron I agree it's a bit of a misuse but it's not a pet vs. cattle distinction IMO. We use the autoscaler to always ensure there are single node instances of Kubernetes available to claimed. A claim request detaches the instance from the autoscaler and for hours it can be used by a developer or for automated testing. At the end of that period the instance is terminated and never heard from again. Using the autoscaler this way is nice because there is no code needed to manage the pool capacity. Most clusters once claimed are used for a handful of minutes before being discarded.

The only thing shared by the claimed instances are VPC and subnets. It feels like there should be another way to tell Kubernetes "Hey these subnets are perfectly valid to deploy ELB's into" that doesn't rely on tags... maybe a configuration flag or using a Dynamo table to track this information.

@whereisaaron
Copy link

I think tags are the correct mechanism @plombardi89 you'll have to propose a patch for hasClusterTag() for findSubnets() that supports a wildcard tag like just kubernetes.io/cluster or kubernetes.io/cluster/_all

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

badaldavda8 added a commit to badaldavda8/amazon-eks-user-guide that referenced this issue Jan 11, 2019
This is because for internal ELB auto subnet discovery, both tags are used -
kubernetes.io/role/internal-elb	1
kubernetes.io/cluster/<cluster-name>	shared

Since as per code, first kubernetes.io/cluster/<cluster-name> is checked and then kubernetes.io/role/internal-elb is checked.

If kubernetes.io/cluster/<cluster-name> is not mentioned, then internal ELB is created on Public Subnets.
kubernetes/kubernetes#29298 (comment)
@rehevkor5
Copy link

Relevant documentation in AWS: "Cluster VPC Considerations" https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

If using EKS, tagging of the VPC & subnets referenced in your EKS cluster appears to be automatic. However, it may be necessary to tag additional subnets.

@Trisia
Copy link

Trisia commented Sep 20, 2019

@sdouche Are those subnets private? A public ELB can't be created in private subnets. K8s will get all subnets tagged with correct KubernetesCluster, then ignore private subnets for public ELB.

You can try to tag a public subnet with correct KubernetesCluster, then wait k8s to retry to create the ELB in that subnet.

You are right. I tagged my public subnet with KubernetesCluster(cluster-name) , then recreate ingress-nginx . now I cant use the ELB access my private net application!

@linbingdouzhe
Copy link

@whereisaaron
as you said , subnet need a tag , so that kubernetes can find the correct sunbets .

@deveshmehta
Copy link

deveshmehta commented Jan 31, 2020

I manage to restrict internal load balance to only intra_subet with the help of tags

public_subnet_tags = {
  "kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
  "kubernetes.io/role/elb"                      = "1"
}

private_subnet_tags = {
  "kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
}

intra_subnet_tags = {
  "kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
  "kubernetes.io/role/internal-elb"             = "1"
}

@nimmichele
Copy link

I would like only to add that in pivotal container service (pks) you have to tag the elb subnet this way:
kubernetes.io/cluster/service-instance_UUID
where UUID is something like 7as7dcc6-d46c-48b4-8e33-364f795a88e3
and leave the value empty
Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests