Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Fix issues with multi cluster environments #66

Closed
simonswine opened this issue Dec 6, 2017 · 9 comments · Fixed by #100
Closed

Fix issues with multi cluster environments #66

simonswine opened this issue Dec 6, 2017 · 9 comments · Fixed by #100
Assignees
Labels
area/provider-aws Indicates a PR is affecting Cloud Provider AWS kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@simonswine
Copy link
Contributor

/kind bug

What happened:

Multi cluster environments are currently failing on AWS

What you expected to happen:

A multi cluster environment comes with a hub and n cluster. Currently generated terraform code is not able to properly create clusters

@simonswine simonswine added area/provider-aws Indicates a PR is affecting Cloud Provider AWS kind/bug Categorizes issue or PR as related to a bug. labels Dec 6, 2017
@simonswine simonswine added this to the release-0.3 milestone Dec 6, 2017
@dippynark
Copy link
Contributor

The hub came up as expected - I then changed my currentCluster to dev-cluster1 (my cluster defined during init) and ran tarmak cluster apply. This hung trying to contact vault:

DEBU[0015] error connecting to tunnel: dial tcp 127.0.0.1:55796: getsockopt: connection refused  app=tarmak destination=vault-3.tarmak.local
DEBU[0015] error connecting to tunnel: dial tcp 127.0.0.1:55795: getsockopt: connection refused  app=tarmak destination=vault-2.tarmak.local
DEBU[0015] error connecting to tunnel: dial tcp 127.0.0.1:55794: getsockopt: connection refused  app=tarmak destination=vault-1.tarmak.local
DEBU[0015] tunnel stopped                                app=tarmak destination=vault-1.tarmak.local
DEBU[0015] tunnel stopped                                app=tarmak destination=vault-2.tarmak.local
DEBU[0015] tunnel stopped                                app=tarmak destination=vault-3.tarmak.local
WARN[0015] ssh is no longer running                      app=tarmak cluster=hub environment=dev stack=vault
WARN[0015] ssh is no longer running                      app=tarmak cluster=hub environment=dev stack=vault
WARN[0015] ssh is no longer running                      app=tarmak cluster=hub environment=dev stack=vault

@dippynark
Copy link
Contributor

I've tried setting up the ssh tunnel manually using the following command:
ssh -F /Users/luke/.tarmak/dev-hub/ssh_config -N -L127.0.0.1:58542:vault-1.tarmak.local:8200 bastion

@simonswine
Copy link
Contributor Author

simonswine commented Jan 25, 2018

Yes the tunnel is connecting (lookup the -N flag)

Basically the problem is tarmak is connecting to a wrong bastion node (you are still missing the output where it says tunnel started (3 times))

This tunnels need to be setup against the bastion in the hub and not the bastion in the cluster, as there is no bastion per cluster there is only one per environment and that is the on int hub

@dippynark
Copy link
Contributor

How can I tell tarmak to use the correct bastion?

@dippynark
Copy link
Contributor

dippynark commented Jan 25, 2018

The previous issue was fixed by changing the path that terraform uses to look for ssh config when running in a multi-cluster architecture

@dippynark
Copy link
Contributor

I'm not seeing errors such as the follow:

DEBU[0050] Error: Error running plan: 6 error(s) occurred:  app=tarmak command=terraform container=02a9cb5aa7b1709543dd8e1d03c6054a205ff28bfb17b43e925665a4e3c437d1 module=terraform stack=kubernetes
DEBU[0050]                                               app=tarmak command=terraform container=02a9cb5aa7b1709543dd8e1d03c6054a205ff28bfb17b43e925665a4e3c437d1 module=terraform stack=kubernetes
DEBU[0050] * aws_security_group.kubernetes_etcd: 1 error(s) occurred:  app=tarmak command=terraform container=02a9cb5aa7b1709543dd8e1d03c6054a205ff28bfb17b43e925665a4e3c437d1 module=terraform stack=kubernetes
DEBU[0050]                                               app=tarmak command=terraform container=02a9cb5aa7b1709543dd8e1d03c6054a205ff28bfb17b43e925665a4e3c437d1 module=terraform stack=kubernetes
DEBU[0050] * aws_security_group.kubernetes_etcd: Resource 'data.terraform_remote_state.network' does not have attribute 'vpc_id' for variable 'data.terraform_remote_state.network.vpc_id'  app=tarmak command=terraform container=02a9cb5aa7b1709543dd8e1d03c6054a205ff28bfb17b43e925665a4e3c437d1 module=terraform stack=kubernetes

Is this the expected error at this point?

@dippynark
Copy link
Contributor

dippynark commented Jan 25, 2018

We need to tag the public subnet(s) so Kubernetes knows where to put loadbalancers - the following comment explains it nicely
kubernetes/kubernetes#29298 (comment)

@dippynark
Copy link
Contributor

dippynark commented Jan 26, 2018

The Kubernetes AWS cloud provider determines its clusterID by looking at the instance tag keys with prefix "kubernetes.io/cluster/" (or the legacy prefix "KubernetesCluster"). The clusterID is set to be the rest of the tag key (or to the tag value for legacy).

When creating an ELB for a cluster, only subnets that have a tag key of "kubernetes.io/cluster/CLUSTERID" and "kubernetes.io/role/elb" are considered. To support ELBs for multi cluster environments, we must therefore minimally set instance tags with a key of "kubernetes.io/cluster/CLUSTERID" on all instances running the cloud controller-manager and tag all subnets we want to house the ELB in with keys of "kubernetes.io/cluster/CLUSTERID" and "kubernetes.io/role/elb"

@dippynark
Copy link
Contributor

When running a multi-cluster environment in an existing VPC, I get the following error when applying the first cluster (after spinning up the hub):

DEBU[0120] * aws_instance.kubernetes_etcd.1: Error launching source instance: InvalidParameterValue: Value (eu-west-1a) for parameter availabilityZone is invalid. Subnet 'subnet-d000288b' is in the availability zone eu-west-1b  app=tarmak command=terraform container=e0a436aaacf0c8f1f1d237f6a0415d9a66612d46d843e17204bd3e7a5ca52bb7 module=terraform stack=kubernetes

jetstack-ci-bot added a commit that referenced this issue Feb 20, 2018
Automatic merge from submit-queue.

66 fix multi cluster envs

**What this PR does / why we need it**: Multi cluster environements currently do not work with Tarmak. This PR fixes this by tagging subnets and instances appropriately so that clusters can function properly

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #66 

**Special notes for your reviewer**:

**Release note**:


```release-note
Fix multi cluster environments by supporting multiple clusters in a single VPC
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/provider-aws Indicates a PR is affecting Cloud Provider AWS kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants