New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: MGMT-17317: Use IBI to provision SNO enhancement doc #6130
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
--- | ||
title: install-sno-using-ibi | ||
authors: | ||
- "@adriengentil" | ||
creation-date: 2024-03-28 | ||
last-updated: 2024-03-28 | ||
--- | ||
|
||
# Support Image-Based Installation in Assisted-Installer to install SNO | ||
|
||
## Summary | ||
|
||
Currently, installing an SNO cluster using the Assisted-Installer takes up to | ||
~40min using the current flow which installs OCP from the ground-up. | ||
|
||
We can reduce the installation time to ~10min by leveraging the Image-Based | ||
installation method, and as a way to increase the success rate of SNO | ||
installations as we restore previously working setup. | ||
|
||
This enhancement describes the changes to bring to the Assisted-Installer in | ||
order to perform SNO installation using the Image-Based installation method. | ||
|
||
## Motivation | ||
|
||
The Image-Based installation brings the following advantages: | ||
- Users will experience faster SNO installations | ||
- Increased success rate installations since we restore the state of a | ||
previously working SNO setup | ||
|
||
### Goals | ||
|
||
- Provide an option in Assisted-Installer to allow a user to install a SNO with | ||
the Image-Based Installation method | ||
|
||
### Non-Goals | ||
|
||
- Focus is on Assisted-Installer in SaaS environment, we don’t aim to support | ||
other environments at this point | ||
|
||
## Proposal | ||
|
||
### As a user, I want to install a SNO using the IBI method | ||
|
||
#### REST API changes | ||
|
||
A new flag will be added to the cluster object [TBD], along with | ||
extra-parameters specific to IBI [TBD]. | ||
Comment on lines
+46
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's worth coming up with at least a first pass on the API you'll add in the enhancement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add it for sure once get some time, the rough idea I had was something like that in the cluster object:
|
||
|
||
`supported-versions` API with be amended to return only the version of OCP | ||
adriengentil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
available for installation with IBI. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Under which circumstanaces? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is in case we (Red Hat) want to provide generic images ready to use with IBI. But we need to discuss if we to follow that path. |
||
|
||
#### Feature support | ||
|
||
A new feature support will be created for the IBI installation. | ||
The feature will be gated by an OCM capability until we make it available to | ||
all users. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add word about Infra-env and RHCOS version that should be used |
||
#### Installation | ||
|
||
When the user will hit the installation button, the `assisted-installer` will | ||
restore the image seed on the disk selected for installation: | ||
- Install RHCOS using `coreos-installer` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should know in advance what version of OCP the user is going to restore |
||
- Partition the disk to create a dedicated `/var/lib/containers` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the point of this? Also could you elaborate on the verb "partition" - what does it mean exactly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found a discussion about the first question here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to this topic, I understand we also have a way that works without the partition for Also, when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, let's add a label for the OCI container that says which method was done:
this way we could support both methods, or having precaching at all |
||
- Grow `/` partition | ||
- Mount `/`, `/boot` and `/var/lib/container` | ||
- Download and restore the seed on the disk using `lca-cli` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The one I used comes from a container image. So I guess we can pull it, and execute it like any other commands? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lca-cli is version per OCP release, so we should be able to pull it as long as we now the OCP version of the seed we are going to restore. |
||
|
||
[Reference](https://github.com/openshift-kni/lifecycle-agent/blob/main/ib-cli/installationiso/data/install-rhcos-and-restore-seed.sh) | ||
|
||
#### Configuration | ||
|
||
After the image seed is restored on the disk, the `assisted-installer` will | ||
drop the configuration in `/opt/openshift`: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll also need to consider static network configuration. I think we should be able to continue to provide nmconnection files (as we are currently in assisted) but in "real" IBI we're using nmstate and running nmstatectl on the host so we'll want to make sure the nmconnection path continues to work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't static networking already based on nmstate in assisted (which generate nmconnection files)? or I miss something? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, nmconnection file should work fine |
||
- A | ||
[manifest.json](https://github.com/openshift-kni/lifecycle-agent/blob/main/docs/post-pivot-configuration.md#user-specifications) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is a technical document, let's link to the SeedReconfiguration struct directly, as it's the source of truth and the linked doc may drift out of date |
||
with: | ||
- Hostname: will be picked from the inventory of host | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. openshift-installer will provide some commands to generate the config artifacts |
||
- Cluster name: will be picked from the cluster definition | ||
- BaseDomain: will be picked from the cluster definition | ||
- ClusterID: will be picked from the cluster definition | ||
- SSHkey: will be provided by the user | ||
- KubeadminPasswordHash: will be generated by Assisted-Installer and | ||
provided to the user | ||
- Pull secret | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this bullet indented separately? |
||
- assisted-controller deployment manifest, it will tell us if the installation | ||
succeeded, and provide a way to get installation logs? | ||
- generate cryptographic materials in order to provide a kubeconfig file to the | ||
user | ||
Comment on lines
+87
to
+88
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @omertuc brought up a point recently about us baking the private keys into the ISO, it might be wise to have these go over an API request from the agent rather than sit in an artifact that the user might misplace. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, but let's keep things simple for now and have that in the back of ours minds as a ticket, until we figure out the security side of things and whether this is a threat that's relevant to protect from |
||
- SSH keys | ||
adriengentil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Eventual extra manifests | ||
- OLM operator manifests | ||
- Custom manifests | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should do some work to call out options/features that we currently support, but won't be able to for IBI clusters. Things like host ignition customization, host install args (maybe these will work since we're still running |
||
|
||
|
||
### Risks and Mitigations | ||
|
||
|
||
## Design Details [optional] | ||
|
||
|
||
### Open Questions | ||
|
||
When selecting Openshift version: | ||
- Do we plan to provide “curated” seed images? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is really an option. Correct me if I'm wrong @javipolo , but the seed config needs to match the hardware of the install target, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. this is one of the real challenges. How do we limit this to a super generic installations only.. x86, no special hw, standard disks,... not even sure what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not so much. In the vDU case, we needed to match even the number of CPU cores, and I don't remember if the same memory of the systems, because of the performance profile that is applied, in the case of a vanilla OCP, I don't think the hardware needs to match so so much ..... IIRC it needs to have enough memory and CPU and be of the same architecture, but those requirements are already there for normal installations .... If there are very special needs, then yes, a "vanilla" seed image won't fit Whatever does not need extra manifests at installation time could be our rule of thumb ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After discussion we decided that in the scope of this enhancement, we will let the user to restore a seed they created. I'll add a section for future work with:
|
||
- Should we allow users to install their own seed? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming this is the direction we go, we'll need some kind of UX for the user to tell us the image. Then we'll need to ensure we don't actually try to access it from our application as that could be a security risk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how useful would it be that u have to generate 1 seed to install 1 cluster? :) I assume it would be usecase only for repeatable users that run heavy automation - we can check in the telemetry maybe and talk to the relevant teams. |
||
|
||
When restoring the image seed: | ||
- Is it worth to pre-cache the container images in SaaS flow? they will be | ||
adriengentil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
downloaded anyway on reboot | ||
- Do we need to support the dedicated partition for /var/lib/containers? | ||
- If yes, is there a sensible default for its size? Or do we need to ask the | ||
user? | ||
|
||
Installation: | ||
- I don’t know the details, but I foresee some re-work of the around the states | ||
the installation goes through? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming we continue to do all the same validations I'd guess this will mostly be around the install "stages". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add the installation steps |
||
|
||
### UI Impact | ||
|
||
A new option in the UI will be provided to the user to choose if they want to | ||
install their SNO with the IBI method. | ||
|
||
### Test Plan | ||
|
||
A new end-to-end will be created to test the IBI flow. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add telemetry |
||
## Drawbacks | ||
|
||
|
||
## Alternatives | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should get more exact timings for this.
Does that 40 minutes include all the pre-install validations we're running or is that from the time the user hits the install button? We're going to still be running all those validations during discovery right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I believe @eranco74 already has some numbers in a doc somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual SNO installation takes 30 minutes.