Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

init-assets should run until it works... #937

Closed
wants to merge 1 commit into from

Conversation

kans
Copy link
Contributor

@kans kans commented May 31, 2017

This PR also makes bootkube.sh and tecotnic.sh run sequentially after init-assets completes which was the behavior before #177 . This PR does not delete all dead code or increase test timeouts which we would probably need to bump by 5 minutes.

@sym3tri
Copy link
Contributor

sym3tri commented May 31, 2017

please wait for review from @s-urbaniak before merging

@kans kans changed the title WIP init-assets should run until it works... init-assets should run until it works... May 31, 2017
Copy link
Contributor

@s-urbaniak s-urbaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to delegate restart logic towards systemd and the outlined changes navigate around limitations in systemd. While I understand what we are trying to achieve here, I am not convinced that this is the right approach.

First this PR unfortunately breaks "vanilla" k8s functionality and the separation of concerns between the three systemd service units init-assets.service, bootkube.service, and tectonic.service.

All of these services have been marked as Type=oneshot conciously to have a clear happens-before-relationship for the startup ordering between them. If and only if init-assets succeeds successfully, it is marked as started, causing bootkube to start. If and only if bootkube is started successfully, tectonic is being as started. All of this models the serial startup behavior in a pretty simple manner.

Now we want to introduce restart semantics using Restart=.... This unfortunately doesn't work with Type=oneshot so this PR removes the directive, marking them as Type=simple effectively.

I was able to simulate the above described behavior using Type=fork. But even marking the services as Type=fork in conjunction with Restart=... we have a problematic edge case. If the init-assets service fails, the bootkube service also fails with a "failed dependency" error. systemd would try to restart init-assets, but not bootkube, even if init-assets eventually succeeds after some retries.

I suppose that is the reason why this PR compacts all the script invocation in an ExecStartPre directive which overcomes this problem.

We could overcome this too by introducing dedicated target, eventually reached once the preceding service succeeds, but this introduces stage-based complexity bloat.

So I see two options:

  1. Implement the retry logic inside init-assets.sh itself and leave the current (simple) construct.
  2. Use Restart=on-failure. If we want to retain the vanilla k8s functionality we would have to make this unit a systemd template unit and inject a true/false value for disabling/enabling tectonic.

Im leaning towards solution 1. as this is effectively what we already do in tectonic.sh.

/cc @lucab with whom I have discussed subtle systemd semantics.


data "ignition_systemd_unit" "tectonic" {
name = "tectonic.service"
enable = "${var.tectonic_service_disabled == 0 ? true : false}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing both the tectonic and bootkube service we are disabling the functionality to install a vanilla k8s cluster.

Copy link
Contributor Author

@kans kans Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why is that a feature we support in the tectonic installer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was requested in #133 and this feature is currently used by the Prometheus team in their CI pipeline on AWS to test operator versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link.

@@ -125,14 +123,3 @@ data "ignition_systemd_unit" "init-assets" {
enable = "${var.assets_s3_location != "" ? true : false}"
content = "${file("${path.module}/resources/services/init-assets.service")}"
}

data "ignition_systemd_unit" "bootkube" {
name = "bootkube.service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the bootkube service actually removed?


[Service]
Type=oneshot
RemainAfterExit=true
Restart=always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always restart the service, also when it exits successfully, see https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart=, effectively busy-looping after installation. The only thing preventing it from busy-looping is the ConditionPathExists which seems like a fragile construct to me.

I strongly suggest to set Restart=on-failure. This will not potentially busy-loop, if the scripts exits with 0.

Copy link
Contributor Author

@kans kans Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


[Service]
Type=oneshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, indeed Type=oneshot does not work with Restart, that's a bummer. There is a corresponding upstream issue: systemd/systemd#2582


ExecStart=/usr/bin/bash /opt/tectonic/init-assets.sh
ExecStartPost=/bin/touch /opt/tectonic/init_assets.done
TimeoutSec=infinity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raised my eyebrows, but is also the effective setting when Type=oneshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExecStartPres timeout otherwise.

@s-urbaniak
Copy link
Contributor

s-urbaniak commented Jun 1, 2017

To add more solutions to the brainstorm-table:

  1. As shown in kube-aws: decrypt-tls-assets.service failed in a controller node coreos-kubernetes#675 (comment) we could ExecStartPre=/usr/bin/systemctl is-active init-assets.service in bootkube.service and ExecStartPre=/usr/bin/systemctl is-active bootkube.service in tectonic.service.
  2. We could add a https://www.freedesktop.org/software/systemd/man/systemd.path.html#PathExists= path unit which will activate bootkube.service.

@kans
Copy link
Contributor Author

kans commented Jun 1, 2017

First this PR unfortunately breaks "vanilla" k8s functionality

Is this feature explicitly supported? If so, I'd be happy to rework this PR.

...and [breaks] the separation of concerns between the three systemd service units init-assets.service, bootkube.service, and tectonic.service.

Init-assets, bootkube, and tectonic are tightly coupled in any case. They must run sequentially, may take an indefinite amount of time, and they should only run once, ever. Spreading the coupling across different services does not reduce the complexity of the system or make it easier to reason about.

All of these services have been marked as Type=oneshot conciously to have a clear happens-before-relationship for the startup ordering between.

Sure, but you don't get a cluster if any of the scripts exit for any reason.

All of this models the serial startup behavior in a pretty simple manner.

Mostly. I believe PostExecs only run if everything exits 0, so the lockfiles won't be around if the units fail.

We could overcome this too by introducing dedicated target, eventually reached once the preceding service succeeds, but this introduces stage-based complexity bloat.

Agreed.

... This PR tries to delegate restart logic towards systemd and the outlined changes navigate around limitations in systemd. While I understand what we are trying to achieve here, I am not convinced that this is the right approach. [We should]implement the retry logic inside init-assets.sh itself and leave the current (simple) construct

I disagree given the general impossibility of writing robust, mission critical bash and the number of existing bugs in the current scripts - init-assets.sh, tectonic.sh and bootkube.sh. In so many words, if bash were capable of doing this, I suspect systemd wouldn't exist.

@s-urbaniak
Copy link
Contributor

@kans thanks a lot for the good discussion! I totally agree about the robustness point about bash ;-) I was even thinking about implementing all the tectonic.sh and init-assets logic in Go (bootkube is already invoked directly) for that reason.

You have a good point about the complexity of orchestrating those three service unit files though which can be avoided. Once I am in the office I'll piggy-pack on this PR to create a follow-up commit using a templated systemd service unit file.

@s-urbaniak
Copy link
Contributor

@kans so I iterated on idea 2. described above and tested a commit which builds on top of this PR, PTAL:

s-urbaniak@1f1d821

Also to have qualified answers to your comments I iterated on the single bullet items:

First this PR unfortunately breaks "vanilla" k8s functionality

Is this feature explicitly supported? If so, I'd be happy to rework this PR.

Please see #937 (comment)

...and [breaks] the separation of concerns between the three systemd service units init-assets.service, bootkube.service, and tectonic.service.

Init-assets, bootkube, and tectonic are tightly coupled in any case. They must run sequentially, may take an indefinite amount of time, and they should only run once, ever. Spreading the coupling across different services does not reduce the complexity of the system or make it easier to reason about.

You have a point, sure. This reasoning also amplifies my conjecture to program at least tectonic.sh and init-assets.sh as a Go program which runs atomically in a systemd service unit similar to bootkube.

All of these services have been marked as Type=oneshot conciously to have a clear happens-before-relationship for the startup ordering between.

Sure, but you don't get a cluster if any of the scripts exit for any reason.

Agreed

All of this models the serial startup behavior in a pretty simple manner.

Mostly. I believe PostExecs only run if everything exits 0, so the lockfiles won't be around if the units fail.

That is indeed correct.

... This PR tries to delegate restart logic towards systemd and the outlined changes navigate around limitations in systemd. While I understand what we are trying to achieve here, I am not convinced that this is the right approach. [We should]implement the retry logic inside init-assets.sh itself and leave the current (simple) construct

I disagree given the general impossibility of writing robust, mission critical bash and the number of existing bugs in the current scripts - init-assets.sh, tectonic.sh and bootkube.sh. In so many words, if bash were capable of doing this, I suspect systemd wouldn't exist.

Totally agreed as described above.

@sym3tri
Copy link
Contributor

sym3tri commented Jun 2, 2017

Note: this is to fix #924

alexsomesan
alexsomesan previously approved these changes Jun 2, 2017
@alexsomesan
Copy link
Contributor

Approved by mistake. Was trying to read through the issue.
Please disregard.

@alexsomesan alexsomesan dismissed their stale review June 2, 2017 20:35

Mistakenly approved

@kans
Copy link
Contributor Author

kans commented Jun 2, 2017

@s-urbaniak @alexsomesan

Thanks for the links and backstory.

Simple services and systemctl is-active seems like the way to go. I'm going to close this PR and open a new one.

@kans kans closed this Jun 5, 2017
@sstarcher
Copy link
Contributor

I looked for other tickets, but it looks like this has not been resolved yet.

another failure example in init-assets

Sep 15 21:44:21 ip-10-0-2-235 bash[758]: pubkey: prefix: "quay.io/coreos/awscli"
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: key: "https://quay.io/aci-signing-key"
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
Sep 15 21:44:21 ip-10-0-2-235 bash[758]:         Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/awscli" without fingerprint review.
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Added key for prefix "quay.io/coreos/awscli" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/awscli/bff313cdaa560b16a8987b8f72abf5f679
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Downloading signature:  0 B/473 B
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Downloading signature:  473 B/473 B
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Downloading signature:  473 B/473 B
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Downloading ACI:  0 B/32 MB
Sep 15 21:44:21 ip-10-0-2-235 bash[758]: Downloading ACI:  16.3 KB/32 MB
Sep 15 21:44:22 ip-10-0-2-235 bash[758]: Downloading ACI:  32 MB/32 MB
Sep 15 21:44:22 ip-10-0-2-235 bash[758]: image: signature verified:
Sep 15 21:44:22 ip-10-0-2-235 bash[758]:   Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Sep 15 21:44:29 ip-10-0-2-235 bash[758]: Error: Mounting "/etc/rkt-resolv.conf" on "/opt/stage2/awscli/rootfs//etc/resolv.conf" failed: No such file or directory
Sep 15 21:44:29 ip-10-0-2-235 bash[758]: cat: /run/metadata/master: No such file or directory
Sep 15 21:44:30 ip-10-0-2-235 systemd[1]: init-assets.service: Main process exited, code=exited, status=1/FAILURE
Sep 15 21:44:30 ip-10-0-2-235 systemd[1]: Failed to start Download Assets.
Sep 15 21:44:30 ip-10-0-2-235 systemd[1]: init-assets.service: Unit entered failed state.
Sep 15 21:44:30 ip-10-0-2-235 systemd[1]: init-assets.service: Failed with result 'exit-code'.```

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants