Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-equinix-ubuntu2204-x64-2 is down #3721

Closed
richardlau opened this issue May 14, 2024 · 3 comments
Closed

test-equinix-ubuntu2204-x64-2 is down #3721

richardlau opened this issue May 14, 2024 · 3 comments
Labels

Comments

@richardlau
Copy link
Member

test-equinix-ubuntu2204-x64-2 is down. We tried to reboot it in #3713 (comment) to see if it would resolve odd behaviour, but it's failing to restart.

Logging into the OOB console shows that the machine is on a UEFI boot prompt:

Shell>

Entering exit enters the bios. Exiting the bios drops us back to the UEFI prompt:

UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (American Megatrends, 0x0005000D)
Mapping table
     BLK0: Alias(s):
          PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)
     BLK4: Alias(s):
          PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x1,0xFFFF,0x0)
     BLK1: Alias(s):
          PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)/HD(1,GPT,ED44A481-507A-482D-9013-DD3C62149A16,0x800,0x1000)
     BLK2: Alias(s):
          PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)/HD(2,GPT,321A3209-D616-46DA-9374-FEC8E885BAD4,0x1800,0x3CF007;40m Alias(s):
          PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)/HD(3,GPT,8C9955A7-C7EC-4D6C-B53B-A00CB04C5F67,0x3D0800,0x37A72E8F)




Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>

This is a repeat of #3713 (comment), which in the end we rebuilt.

Note that this machine is one of the two jenkins-workspace machines that we have been asked to migrate off Equinix Metal #3597.

@ryanaslett
Copy link
Contributor

I was able to investigate and get it to boot.

I used the 'rescue' mode on the equinix console which booted it into alpine.

Not making it past boot and going straight to UEFI shell is indicative of something wrong with the grub configuration.

From the alpine distro I investigated the disks to see how these were configured, and found the first clue:

Device Start End Sectors Size Type /dev/sda1 2048 6143 4096 2M BIOS boot /dev/sda2 6144 3999743 3993600 1.9G Linux filesystem /dev/sda3 3999744 937703054 933703311 445.2G Linux filesystem

When these were originally installed they were built with old MBR/BIOS style boot records.

When I got to the BIOS, I changed the boot strategy from UEFI to Legacy, and it came up.

If I had to guess, somehow the BIOS settings for the boot changed from Legacy to UEFI (perhaps an OOB bios update mechanism that doesnt require a reboot)

The other possibility is that the apt update/upgrade triggered a grub install that somehow also modified the BIOS settings.

Either way, its back, and we now know why that happened, and how to fix it if it happens again before we get rid of these entirely.

@targos
Copy link
Member

targos commented May 15, 2024

I put it back online in Jenkins. Let's see how it handles new jobs.

@richardlau
Copy link
Member Author

AFAICT machine appears to be working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants