Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homed disklabel corrupt after shutdown (systemd-homework was terminated due to timeout) #32873

Open
intgr opened this issue May 16, 2024 · 3 comments
Labels
bug 🐛 Programming errors, that need preferential fixing homed homed, homectl, pam_homed

Comments

@intgr
Copy link
Contributor

intgr commented May 16, 2024

systemd version the issue has been seen with

255.6-1-arch

Used distribution

Arch Linux

Linux kernel version used

6.8.9-arch1-2

CPU architectures issue was seen on

x86_64

Component

systemd-homed

Expected behaviour you didn't see

When I haven't rebooted my computer for a while and then need to reboot, systemd-homework can take a long time resizing the home partition on shutdown. But I don't think it has ever taken as long as it did in this instance. I'm surprised I couldn't find other similar bug reports.

Expected behavior is no corruption. ;)

Unexpected behaviour you saw

After coming back from reboot, I could no longer log in to my user. Running homectl activate marti as root gave the ever so misleading and frustrating error "Failed to validate disk label: Package not installed". For a tool as critical as this, homed really badly needs better error reporting. Also it needs better troubleshooting tools.

Running losetup -fP --show /home/marti.home did not create the expected partition device /dev/loop1p1 that I could mount. However, running hexdump revealed that the LUKS header was intact at offset 0x100000.

Using losetup -f -o 1048576 created a loopback device that I was able to activate via cryptsetup and thankfully btrfs check reported that the file system was completely fine. I took a backup of the whole .home file just in case.

More complete logs and hexdump of /home/marti.home volume header pasted here: https://gist.github.com/intgr/a78e117e08c5a9347092fed02844f68a

Steps to reproduce the problem

Make lots of changes to your home file system and reboot?

Additional program output to the terminal or log subsystem illustrating the issue

Brief outtake of logs:

May 16 10:39:12 newn systemd-logind[627]: The system will reboot now!
May 16 10:39:12 newn systemd-logind[627]: System is rebooting.
May 16 10:39:20 newn systemd[1597]: Reached target Shutdown.
May 16 10:39:20 newn systemd-homed[625]: Got notification that all sessions of user marti ended, deactivating automatically.
May 16 10:39:20 newn systemd-homed[625]: Automatically deactivating home of user marti.
May 16 10:39:26 newn systemd-homework[1760452]: Discovered used LUKS device /dev/mapper/home-marti, and validated password.
May 16 10:39:26 newn systemd-homework[1760452]: Successfully re-activated LUKS device.
May 16 10:39:26 newn systemd-homework[1760452]: Discovered used loopback device /dev/loop0.
May 16 10:39:26 newn systemd-homework[1760452]: offset = 1048576, size = 855169949696, image = 855172530176
May 16 10:39:26 newn systemd-homework[1760452]: Ready to resize image size 796.4G → 402.5G, partition size 796.4G → 402.5G, file system size 796.4G → 402.5G.
May 16 10:39:27 newn kernel: BTRFS info (device dm-1): relocating block group 473550553088 flags data
[...]
May 16 10:40:49 newn kernel: BTRFS info (device dm-1): relocating block group 459591909376 flags data
May 16 10:40:50 newn systemd[1]: systemd-homed-activate.service: Stopping timed out. Terminating.
May 16 10:40:50 newn systemd[1]: systemd-homed-activate.service: Control process exited, code=killed, status=15/TERM
May 16 10:40:50 newn systemd[1]: systemd-homed-activate.service: Failed with result 'timeout'.
May 16 10:40:50 newn systemd-homed[625]: Worker process for home marti is still running while exiting. Waiting for it to finish.
May 16 10:40:50 newn kernel: BTRFS info (device dm-1): relocating block group 468181843968 flags metadata|dup
[...]
May 16 10:41:20 newn systemd-homework[1760452]: File system resizing from 796.4G to 405.0G completed.
May 16 10:41:20 newn kernel: BTRFS info (device dm-1): resize device /dev/mapper/home-marti (devid 1) from 434904244224 to 434904236032
May 16 10:41:20 newn systemd-homework[1760452]: Synchronized disk.
May 16 10:41:20 newn kernel: BTRFS info: devid 1 device path /dev/mapper/home-marti changed to /dev/dm-1 scanned by (udev-worker) (1760652)
May 16 10:41:20 newn kernel: BTRFS info: devid 1 device path /dev/dm-1 changed to /dev/mapper/home-marti scanned by (udev-worker) (1760652)
May 16 10:41:20 newn systemd-homework[1760452]: LUKS device shrinking completed.
May 16 10:41:20 newn systemd-homework[1760452]: Refreshing loop device size completed.
May 16 10:41:20 newn kernel: loop0: detected capacity change from 1670253808 to 849455104
May 16 10:41:20 newn systemd-homed[625]: Waiting for worker process for home marti timed out. Ignoring.
May 16 10:41:20 newn systemd-homed[625]: Failed to reply to DeactivateAllHomes method call, ignoring: Transport endpoint is not connected
May 16 10:41:20 newn systemd[1]: systemd-homed.service: Killing process 1760452 (systemd-homewor) with signal SIGKILL.
@intgr intgr added the bug 🐛 Programming errors, that need preferential fixing label May 16, 2024
@github-actions github-actions bot added homed homed, homectl, pam_homed labels May 16, 2024
@intgr

This comment was marked as resolved.

@intgr intgr changed the title homed partition table corrupt after systemd-homework was terminated due to timeout homed disklabel corrupt after systemd-homework was terminated due to timeout May 16, 2024
@intgr
Copy link
Contributor Author

intgr commented May 16, 2024

gnome-disk-utility displays no partitions when activating a loopback device (without offset)

Screenshot from 2024-05-16 21-58-09

@intgr
Copy link
Contributor Author

intgr commented May 17, 2024

I haven't yet figured out how to fix the presumably corrupt GPT partition header. Any suggestions?

I eneded up recreating my user and copying files over from the old rescued backup image. But there really should be a better option.

@intgr intgr changed the title homed disklabel corrupt after systemd-homework was terminated due to timeout homed disklabel corrupt after shutdown (systemd-homework was terminated due to timeout) May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing homed homed, homectl, pam_homed
Development

No branches or pull requests

1 participant