Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change internal installation approach #585

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

remimimimimi
Copy link
Contributor

Description of changes

Change approach which is used on installer. Instead of flashing pre-build image - partition disk and assemble system during installation process. Like it's done during normal NixOS installation process using nixos-install command.

Checklist for things done

  • Summary of the proposed changes in the PR description
  • More detailed description in the commit message(s)
  • Commits are squashed into relevant entities - avoid a lot of minimal dev time commits in the PR
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • PR linked to architecture documentation and requirement(s) (ticket id)
  • Test procedure described (or includes tests). Select one or more:
    • Tested on Lenovo X1 x86_64
    • Tested on Jetson Orin NX or AGX aarch64
    • Tested on Polarfire riscv64
  • Author has run nix flake check --accept-flake-config and it passes
  • All automatic Github Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing

To run introduced test execute nix build .#checks.x86_64-linux.installer --log-format bar-with-logs, otherwise testing process same as in previous installer-related PRs.

@remimimimimi remimimimimi temporarily deployed to internal-build-workflow April 30, 2024 14:14 — with GitHub Actions Inactive
@remimimimimi remimimimimi added the Needs Testing CI Team to pre-verify label Apr 30, 2024
Copy link
Contributor

@leivos-unikie leivos-unikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this
nix build github:remimimimimi/ghaf/fixed-installer#checks.x86_64-linux.installer --log-format bar-with-logs

The check seemed to pass. At first run there was a lot of output lines and some cleanup in the end. When I tried running it second time there was no output in the terminal. It created an empty result directory linked to nix store.

Is there be some explanation/documentation how that check works? Is it basically a build check and an installer test withing a VM or does it check something more?

Tests on Lenovo-X1

  • Built the installer with
    nix build --verbose github:remimimimimi/ghaf/fixed-installer#lenovo-x1-carbon-gen11-debug-installer
  • ci-test-automation tests ok
  • All apps launch and poweroff/reboot buttons work

@remimimimimi
Copy link
Contributor Author

Tested this nix build github:remimimimimi/ghaf/fixed-installer#checks.x86_64-linux.installer --log-format bar-with-logs

The check seemed to pass. At first run there was a lot of output lines and some cleanup in the end. When I tried running it second time there was no output in the terminal. It created an empty result directory linked to nix store.

Is there be some explanation/documentation how that check works? Is it basically a build check and an installer test withing a VM or does it check something more?

There's documentation on wiki. But tl;dr is that it builds derivation, and during the build process runs tests. So because you haven't changed derivation, you saw no output.

@leivos-unikie
Copy link
Contributor

When testing this ghaf installer yesterday at the first time (built with nix build --verbose github:remimimimimi/ghaf/fixed-installer#lenovo-x1-carbon-gen11-debug-installer) I got 2 NixOS generations available at boot menu:
IMG_2592

The older generation had still old labwc version.

Today I tried the installer from ghaf mainline and from this PR again but could not reproduce the situation above. I am wondering how did that happen? I know that I have had some older ghaf version in the nvme but I had "disabled" that by booting from ghaf-installer and running something like this: sudo dd if=/dev/zero of=/dev/nvme0n1 count=100 bs=4M. Could it be that if there were some leftovers from the previous installation that the installer did some nixos-rebuild on top of that? I would expect the installer to wipe everything and install fresh ghaf.

@leivos-unikie
Copy link
Contributor

And this morning I also noticed that the installer from this PR creates always a new 'Linux Boot Manager' entry to the Boot Priority Order list in BIOS. To be able to boot from USB SSD again it requires manually booting to UEFI BIOS and lowering or discarding the 'Linux Boot Manager' entry. This can be a problem from the automated testing point of view. We have been preparing to run also installer tests in automated setup but 'Linux Boot Manager' will prevent that.

One observation more: This installer takes around 7min to run, whereas the old made it in 1min. But I understand this is the cost we have to pay for using nixos-install instead of just flashing.

@leivos-unikie leivos-unikie added the Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon label May 3, 2024
@remimimimimi
Copy link
Contributor Author

When testing this ghaf installer yesterday at the first time (built with nix build --verbose github:remimimimimi/ghaf/fixed-installer#lenovo-x1-carbon-gen11-debug-installer) I got 2 NixOS generations available at boot menu: IMG_2592

The older generation had still old labwc version.

Today I tried the installer from ghaf mainline and from this PR again but could not reproduce the situation above. I am wondering how did that happen? I know that I have had some older ghaf version in the nvme but I had "disabled" that by booting from ghaf-installer and running something like this: sudo dd if=/dev/zero of=/dev/nvme0n1 count=100 bs=4M. Could it be that if there were some leftovers from the previous installation that the installer did some nixos-rebuild on top of that? I would expect the installer to wipe everything and install fresh ghaf.

That's probably because you overwrote with zeros only 400M. Also, full system wipe is done (optionally) to my knowledge only by Debian installer, others simply write on top.

And this morning I also noticed that the installer from this PR creates always a new 'Linux Boot Manager' entry to the Boot Priority Order list in BIOS. To be able to boot from USB SSD again it requires manually booting to UEFI BIOS and lowering or discarding the 'Linux Boot Manager' entry. This can be a problem from the automated testing point of view. We have been preparing to run also installer tests in automated setup but 'Linux Boot Manager' will prevent that.

One observation more: This installer takes around 7min to run, whereas the old made it in 1min. But I understand this is the cost we have to pay for using nixos-install instead of just flashing.

It creates new boot entry on every installation because otherwise the system may be unbootable. This behavior is because of disko-install --write-efi-boot-entries flag, and it’s also required to build a system without internet.

Otherwise, because of internal structure it attempts to rebuild packages, which require internet connection. That was fixed upstream only recently.

@leivos-unikie
Copy link
Contributor

It creates new boot entry on every installation because otherwise the system may be unbootable. This behavior is because of disko-install --write-efi-boot-entries flag, and it’s also required to build a system without internet.

ghaf-installer in the current ghaf mainline does not produce any 'Linux Boot Manager' entry to BIOS. I think in that case it's enough that NVMe0 is in the Boot Order list.

@leivos-unikie leivos-unikie removed the Needs Testing CI Team to pre-verify label May 6, 2024
@leivos-unikie
Copy link
Contributor

It creates new boot entry on every installation because otherwise the system may be unbootable. This behavior is because of disko-install --write-efi-boot-entries flag, and it’s also required to build a system without internet.

Otherwise, because of internal structure it attempts to rebuild packages, which require internet connection. That was fixed upstream only recently.

I tested without that --write-efi-boot-entries flag. Installation fails both with and without internet connection.

@brianmcgillion
Copy link
Collaborator

PXL_20240506_135341824 MP

@brianmcgillion
Copy link
Collaborator

fails to build the image in the installer

@tiiuae tiiuae deleted a comment from mikatammi May 6, 2024
@Mic92
Copy link
Collaborator

Mic92 commented May 6, 2024

If we switch from the iso here to pure disko than these hash mismatch issues should be fixed by: nix-community/disko#625

@Mic92
Copy link
Collaborator

Mic92 commented May 7, 2024

Noticed that in the latest version I switched to use closureInfo, which is now also needed for offline installation: https://github.com/nix-community/disko/pull/625/files?short_path=cb7823f#diff-cb7823f0751d86126c961181be2eea6ecf59d368c3019ddd0ce86dab10ec92a3

@leivos-unikie
Copy link
Contributor

It creates new boot entry on every installation because otherwise the system may be unbootable.

I tested that ghaf is able to boot from nvme even after removing the 'Linux Boot Manager' entry.

@leivos-unikie
Copy link
Contributor

Related to test automation, @remimimimimi thanks for mentioning about efibootmgr. I drafted a script to delete all Linux Boot Manager entries, so that we are able to automate it and boot again from USB SSD after running test with ghaf-installer.

for f in $(efibootmgr | grep Linux | awk 'NR > 0 {print $1}' | cut -c 5-8)
do
    sudo efibootmgr -q -b ${f} -B
done

There is just the potential risk that in case ghaf-installer installs a broken ghaf which cannot boot (and the active Linux Boot Manager entry) we can't run the script above and cannot handle the situation anymore automatically (need to boot manually to UEFI BIOS and edit the boot order). Maybe the best way to reduce this risk is to run the tests first on ghaf image (booted from USB SSD) and if the plain ghaf image is bootable only then proceed to run tests for ghaf-installer -> ghaf.

Change approach which is used on installer. Instead of flashing pre-build image
- partition disk and assemble system during installation process. Like it's done
during normal NixOS installation process using `nixos-install` command.

Signed-off-by: Valentin Kharin <valentin.kharin@unikie.com>
Signed-off-by: Valentin Kharin <valentin.kharin@unikie.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants