Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MGMT-17317: Use IBI to provision SNO enhancement doc #6130

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

adriengentil
Copy link
Contributor

No description provided.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 28, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 28, 2024

@adriengentil: This pull request references MGMT-17317 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 28, 2024
@adriengentil
Copy link
Contributor Author

adriengentil commented Mar 28, 2024

/retitle WIP: MGMT-17317: Use IBI to provision SNO enhancement doc

@openshift-ci openshift-ci bot changed the title MGMT-17317: Use IBI to provision SNO enhancement doc WIP: MGMT-17317: Use IBI to provision SNO enhancement doc Mar 28, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2024
Copy link

openshift-ci bot commented Mar 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adriengentil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2024
@adriengentil
Copy link
Contributor Author

/cc @carbonin

@openshift-ci openshift-ci bot requested a review from carbonin April 10, 2024 08:55
docs/enhancements/install-sno-using-ibi.md Show resolved Hide resolved
Comment on lines +13 to +18
Currently, installing an SNO cluster using the Assisted-Installer takes up to
~40min using the current flow which installs OCP from the ground-up.

We can reduce the installation time to ~10min by leveraging the Image-Based
installation method, and as a way to increase the success rate of SNO
installations as we restore previously working setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should get more exact timings for this.

Does that 40 minutes include all the pre-install validations we're running or is that from the time the user hits the install button? We're going to still be running all those validations during discovery right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I believe @eranco74 already has some numbers in a doc somewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual SNO installation takes 30 minutes.

### Open Questions

When selecting Openshift version:
- Do we plan to provide “curated” seed images?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really an option.

Correct me if I'm wrong @javipolo , but the seed config needs to match the hardware of the install target, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is one of the real challenges. How do we limit this to a super generic installations only.. x86, no special hw, standard disks,... not even sure what

Copy link
Member

@javipolo javipolo Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much. In the vDU case, we needed to match even the number of CPU cores, and I don't remember if the same memory of the systems, because of the performance profile that is applied, in the case of a vanilla OCP, I don't think the hardware needs to match so so much .....

IIRC it needs to have enough memory and CPU and be of the same architecture, but those requirements are already there for normal installations ....

If there are very special needs, then yes, a "vanilla" seed image won't fit

Whatever does not need extra manifests at installation time could be our rule of thumb ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion we decided that in the scope of this enhancement, we will let the user to restore a seed they created. I'll add a section for future work with:

  • add an option to install and create a seed in Assisted-Installer
  • provide vanilla seed images


When selecting Openshift version:
- Do we plan to provide “curated” seed images?
- Should we allow users to install their own seed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is the direction we go, we'll need some kind of UX for the user to tell us the image. Then we'll need to ensure we don't actually try to access it from our application as that could be a security risk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how useful would it be that u have to generate 1 seed to install 1 cluster? :) I assume it would be usecase only for repeatable users that run heavy automation - we can check in the telemetry maybe and talk to the relevant teams.


Installation:
- I don’t know the details, but I foresee some re-work of the around the states
the installation goes through?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we continue to do all the same validations I'd guess this will mostly be around the install "stages".
We could probably reuse some (writing image to disk), but probably won't need others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the installation steps

#### Configuration

After the image seed is restored on the disk, the `assisted-installer` will
drop the configuration in `/opt/openshift`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to consider static network configuration.

I think we should be able to continue to provide nmconnection files (as we are currently in assisted) but in "real" IBI we're using nmstate and running nmstatectl on the host so we'll want to make sure the nmconnection path continues to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't static networking already based on nmstate in assisted (which generate nmconnection files)? or I miss something?
For sure, we need to consider this flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, nmconnection file should work fine

- Partition the disk to create a dedicated `/var/lib/containers`
- Grow `/` partition
- Mount `/`, `/boot` and `/var/lib/container`
- Download and restore the seed on the disk using `lca-cli`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does lca-cli come from? How do we get it into the discovery image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one I used comes from a container image. So I guess we can pull it, and execute it like any other commands?
@javipolo can we always use the last version to restore any existing? Is there some constraints on the lca-cli version based on the OCP version we try to restore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lca-cli is version per OCP release, so we should be able to pull it as long as we now the OCP version of the seed we are going to restore.

Comment on lines +46 to +47
A new flag will be added to the cluster object [TBD], along with
extra-parameters specific to IBI [TBD].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth coming up with at least a first pass on the API you'll add in the enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it for sure once get some time, the rough idea I had was something like that in the cluster object:

{
  installation_method: "ibi",
  ibi_cluster_image: "quay.io/...." # allowed to be set only when installation_method is "ibi"
  openshift_version: nil, # because installation_method is set to "ibi"
  ocp_release_image: nil # because installation_method is set to "ibi"
}

- OLM operator manifests
- Custom manifests

### Implementation Details/Notes/Constraints [optional]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do some work to call out options/features that we currently support, but won't be able to for IBI clusters.

Things like host ignition customization, host install args (maybe these will work since we're still running coreos-installer?), cluster network (cidr), are some that come to mind immediately. Really anything that will be taken directly from the seed, but is already in our API we'll need to make optional and block the user from setting for IBI clusters.

Comment on lines +87 to +88
- generate cryptographic materials in order to provide a kubeconfig file to the
user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omertuc brought up a point recently about us baking the private keys into the ISO, it might be wise to have these go over an API request from the agent rather than sit in an artifact that the user might misplace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but let's keep things simple for now and have that in the back of ours minds as a ticket, until we figure out the security side of things and whether this is a threat that's relevant to protect from

extra-parameters specific to IBI [TBD].

`supported-versions` API with be amended to return only the version of OCP
available for installation with IBI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstanaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in case we (Red Hat) want to provide generic images ready to use with IBI. But we need to discuss if we to follow that path.

When the user will hit the installation button, the `assisted-installer` will
restore the image seed on the disk selected for installation:
- Install RHCOS using `coreos-installer`
- Partition the disk to create a dedicated `/var/lib/containers`
Copy link
Contributor

@omertuc omertuc Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this?

Also could you elaborate on the verb "partition" - what does it mean exactly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a discussion about the first question here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this topic, I understand we also have a way that works without the partition for /var/lib/containers. In the case we allow users to restore their own snapshots, should we be able to support both partition schemes? @javipolo what do you think?

Also, when /var/lib/containers is required, how can we determine the right size for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's add a label for the OCI container that says which method was done:

  • partition
  • directory
  • none

this way we could support both methods, or having precaching at all

After the image seed is restored on the disk, the `assisted-installer` will
drop the configuration in `/opt/openshift`:
- A
[manifest.json](https://github.com/openshift-kni/lifecycle-agent/blob/main/docs/post-pivot-configuration.md#user-specifications)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a technical document, let's link to the SeedReconfiguration struct directly, as it's the source of truth and the linked doc may drift out of date

- SSHkey: will be provided by the user
- KubeadminPasswordHash: will be generated by Assisted-Installer and
provided to the user
- Pull secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bullet indented separately?

adriengentil and others added 2 commits April 11, 2024 15:45
Co-authored-by: Omer Tuchfeld <omertuchfeld@gmail.com>
Co-authored-by: Omer Tuchfeld <omertuchfeld@gmail.com>
Copy link

openshift-ci bot commented Apr 11, 2024

@adriengentil: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

- A
[manifest.json](https://github.com/openshift-kni/lifecycle-agent/blob/main/docs/post-pivot-configuration.md#user-specifications)
with:
- Hostname: will be picked from the inventory of host
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift-installer will provide some commands to generate the config artifacts

### Test Plan

A new end-to-end will be created to test the IBI flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add telemetry


When the user will hit the installation button, the `assisted-installer` will
restore the image seed on the disk selected for installation:
- Install RHCOS using `coreos-installer`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should know in advance what version of OCP the user is going to restore

A new feature support will be created for the IBI installation.
The feature will be gated by an OCM capability until we make it available to
all users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add word about Infra-env and RHCOS version that should be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants