Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

Ability to use existing VPC subnets #671

Closed
wants to merge 8 commits into from
Closed

Conversation

cknowles
Copy link

@cknowles cknowles commented Sep 15, 2016

We want to manage our VPC separately so this supports deployment to existing VPC subnets.

Inspired by @eugenetaranov's existing work in #212 since it avoids the manual edit of the generate stack template.

@colhom
Copy link
Contributor

colhom commented Sep 15, 2016

@c-knowles give me a few days to digest this PR. I haven't figured out where I stand on kube-aws deploying to existing subnets.

@cknowles
Copy link
Author

Sure. To use the option, just put the IDs of the subnets in as per the cluster template changes. Are there other places enforcing the options specified are valid as a set? For instance, specifying the subnet IDs without specifying the VPC ID probably isn't an option we'd want to support.

The previous workarounds I've seen mean that kube-aws was still creating the subnets but then the stack template would have to be edited to accommodate the existing route tables.

I had considered #212 and whether to just add that network setup as an option using a typical setup. i.e. build that directly into kube-aws. However, one major advantage for us doing it this way is that we can then share the VPC between different systems which in turn cuts the cost of running several NAT Gateways. So we'd at least need the option to turn off network setup all together.

What I'm wondering now is whether it would be best to split out the network setup into a separate stack inside kube-aws and then provide the user with the option about which one they want and whether to even create one. In the first instance we could support the NAT Gateway scenario and the default all public one that already exists. The network stack could be created first and then the IDs exported from it into the cluster.yaml file. I think this would then simplify the K8S stack setup because it would always take a list of IDs for the VPC no matter which option is chosen by the user.

Chris Knowles added 3 commits September 19, 2016 13:38
Only check if subnets have a zone or ID, it seems the simplest way to
re-use all the existing validation.
On second thought, availability zone is required to define the scaling
group so make sure we always have it.
@cknowles
Copy link
Author

cknowles commented Sep 20, 2016

Having a little trouble checking out how well on latest master updates, it seems to have an issue with controller startup. I thought it was my new config at first but I reverted to the old setup - kube-aws creating the VPC etc - and controller logs are saying this over and over:

Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Service hold-off time over, scheduling restart.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Stopped install-kube-system.service.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal curl[9476]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal curl[9476]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal curl[9476]: [149B blob data]
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Starting install-kube-system.service...
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Control process exited, code=exited status=7
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Service hold-off time over, scheduling restart.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Stopped install-kube-system.service.
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Starting install-kube-system.service...
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal curl[9479]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 20 10:10:28 ip-10-0-0-50.eu-west-1.compute.internal curl[9479]:                                  Dload  Upload   Total   Spent    Left  Speed

@colhom which branch is best to go with if I want to ensure stability? I will try another cluster shortly using the latest published kube-aws just to check if it's a more general problem with the OS updates or similar.

EDIT: Latest master seems to work if releaseChannel: alpha is specified in cluster.yaml

@cknowles
Copy link
Author

Interesting the latest published build of kube-aws (0.8.1) does exactly as above on the controller but eventually seems to recover:

...
Sep 21 03:01:43 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Control process exited, code=exited status=7
Sep 21 03:01:43 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Failed to start install-kube-system.service.
Sep 21 03:01:43 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Unit entered failed state.
Sep 21 03:01:43 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Failed with result 'exit-code'.
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: install-kube-system.service: Service hold-off time over, scheduling restart.
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Stopped install-kube-system.service.
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Starting install-kube-system.service...
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]: [158B blob data]
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]: {
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "major": "1",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "minor": "3",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "gitVersion": "v1.3.4+coreos.0",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "gitCommit": "be9bf3e842a90537e48361aded2872e389e902e7",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "gitTreeState": "clean",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "buildDate": "2016-08-02T00:54:53Z",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "goVersion": "go1.6.2",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "compiler": "gc",
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]:   "platform": "linux/amd64"
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal systemd[1]: Started install-kube-system.service.
Sep 21 03:01:44 ip-10-0-0-50.eu-west-1.compute.internal curl[11619]: }

Chris Knowles added 3 commits September 21, 2016 12:21
If our workers are in private subnets, we may wish to place the
controller in a public subnet to access the dashboard without any
tunnelling.
@cknowles
Copy link
Author

@colhom based on some initial usage of this, perhaps allowing the usage of existing route tables would be a better way to go so we have less chance of conflicting private IPs. We could even support the new cross-stack references as well. Any thoughts on that?

@cknowles
Copy link
Author

cknowles commented Oct 7, 2016

Closing this for now as described in #716, I believe using existing route tables rather than subnets to be a better solution.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants