Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlled upgrade paths #443

Open
ghost opened this issue Apr 20, 2023 · 4 comments
Open

Controlled upgrade paths #443

ghost opened this issue Apr 20, 2023 · 4 comments

Comments

@ghost
Copy link

ghost commented Apr 20, 2023

Good morning team!!

Congratulations on a really cool and robust OS upgrade system!!

We would like to know if it is currently supported, in the roadmap or could be requested a more controlled upgrade path process while applying the same upgrade mechanism (orchestrated through the operator) but where the OS updates to a version previously configured instead of to the latest version published or available.

The reason for my request is to be able to honor the requirement of some companies (or teams within those companies) that state that no software component should be deployed in an environment unless confirmed working in the immediate previous lower environment.

Let's picture a company with 3 environments: TEST, STAGE, and PRODUCTION; all of them in sync running OS v1.12.0 with the following upgrade schedules configured:

  • TEST should upgrade every 1st day of the month.
  • STAGE should upgrade every 8th day of the month.
  • PRODUCTION should upgrade every 20th day of the month.

The controlled upgrade path process for the company's ecosystem would look like this:

  • Before the 1st of the month, OS v1.13.2 is released.
  • On the 1st day of the month, the cluster that belongs to the TEST environment is upgraded to OS v1.13.2.
  • Sometime between the 2nd of the month and the 20th of the month, OS v1.13.3 is released.
  • Sometime between the 1st of the month and the 7th of the month, the company validates OS v1.13.2 as suitable to continue progressing across environments and sets STAGE to be upgraded to v1.13.2 on its next upgrade session.
  • On the 8th day of the month, the cluster that belongs to the STAGE environment is upgraded to OS v1.13.2.
  • Sometime between the 8th day of the month and the 19th day of the month, the company validates OS v1.13.2 as suitable to continue progressing across environments and sets PRODUCTION to be upgraded to v1.13.2 on its next upgrade session.
  • From the 20th of the month to the 1st of the following month, all environments are in sync.

According to this process, OS v1.13.3 will not be considered during this chain of upgrades until the 1st day of the following month (unless a newer version has been released or published in the meantime).

This intended target version the operator should upgrade all the nodes of a cluster to could be configured as an additional environment variable for the controller

If you think that further clarifications or explanations are needed, we are happy to provide them.

@jpmcb
Copy link
Contributor

jpmcb commented Apr 28, 2023

where the OS updates to a version previously configured instead of to the latest version published or available.

There is currently no way to do this directly via the bottlerocket update operator - under the hood, the update operator is simply invoking the apiclient for the bottlerocket node on the cluster. This means that any and all bottlerocket API settings will be honored. It is possible to do this via settings:

apiclient set settings.updates.version-lock="v1.12.0"

where the upgrade on that node will be locked to that version.

There is a proposed settings operator which would orchestrate settings across nodes in a cluster but that isn't currently on our roadmap:

bottlerocket-os/bottlerocket#873

no software component should be deployed in an environment unless confirmed working in the immediate previous lower environment.

This also isn't directly supported by Bottlerocket or the update operator. I've pondered similar systems before and opend this issue:

bottlerocket-os/bottlerocket#2805

TEST should upgrade every 1st day of the month.
STAGE should upgrade every 8th day of the month.
PRODUCTION should upgrade every 20th day of the month.

This is supported currently via our time windows / cron scheduler:

https://github.com/bottlerocket-os/bottlerocket-update-operator#set-scheduler

Although that only sets the date/time via a cron that an update can be invoked. There's no mechanism to coordinate via some control plane that different updates from different clusters were successful or not.

@kublaikhan1
Copy link

@jpmcb have you looked into update waves for staggered deployments in bottlerocket https://github.com/bottlerocket-os/bottlerocket/tree/develop/sources/updater/waves

@ghost
Copy link
Author

ghost commented May 13, 2023

where the OS updates to a version previously configured instead of to the latest version published or available.

There is currently no way to do this directly via the bottlerocket update operator - under the hood, the update operator is simply invoking the apiclient for the bottlerocket node on the cluster. This means that any and all bottlerocket API settings will be honored. It is possible to do this via settings:

apiclient set settings.updates.version-lock="v1.12.0"

where the upgrade on that node will be locked to that version.

There is a proposed settings operator which would orchestrate settings across nodes in a cluster but that isn't currently on our roadmap:

bottlerocket-os/bottlerocket#873

no software component should be deployed in an environment unless confirmed working in the immediate previous lower environment.

This also isn't directly supported by Bottlerocket or the update operator. I've pondered similar systems before and opend this issue:

bottlerocket-os/bottlerocket#2805

TEST should upgrade every 1st day of the month.
STAGE should upgrade every 8th day of the month.
PRODUCTION should upgrade every 20th day of the month.

This is supported currently via our time windows / cron scheduler:

https://github.com/bottlerocket-os/bottlerocket-update-operator#set-scheduler

Although that only sets the date/time via a cron that an update can be invoked. There's no mechanism to coordinate via some control plane that different updates from different clusters were successful or not.

Thanks for the comments @jpmcb, that could potentially work with using "settings.updates.version-lock" but, can we lock a version different than the one currently running? I mean, in my example, could we use it to point to what version it needs to upgrade to or only with the current version to prevent any further upgrade?

@jpmcb have you looked into update waves for staggered deployments in bottlerocket https://github.com/bottlerocket-os/bottlerocket/tree/develop/sources/updater/waves

Waves is a pretty cool concept, thanks for introducing it @kublaikhan1. The problem with waves, as I've understood from the documentation, is that it's not predictable, you cannot exactly control which ones and when that upgrade needs to happen.

However, if we could introduce the concept of channels, it would help make the process a bit more predictable and leave the people in charge of every channel to decide when to publish the release for their subscribers.

To start with, defining 3 channels would be enough: fast, candidate, and stable. Without the concept of channel, all we can do is to deploy 3 different TUF repositories.

@jpmcb
Copy link
Contributor

jpmcb commented May 17, 2023

Yes you can lock to a different version than the one that is currently running:

settings.updates.version-lock: Controls the version that will be selected when you issue an update request. Can be locked to a specific version like v1.0.0, or latest to take the latest available version. Defaults to latest.

https://github.com/bottlerocket-os/bottlerocket#updates-settings

And brupop should see that and issue the update request for that locked version.

However, if we could introduce the concept of channels, it would help make the process a bit more predictable and leave the people in charge of every channel to decide when to publish the release for their subscribers.

Channels is something we've thought of in the past (like having a faster RC channel for our releases). But we currently aren't supporting that. This would be a great feature request to surface in the main bottlerocket repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants