Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-analyze critical-chain reports invalid timestamp #17191

Closed
hexchain opened this issue Sep 28, 2020 · 5 comments · Fixed by #32935
Closed

systemd-analyze critical-chain reports invalid timestamp #17191

hexchain opened this issue Sep 28, 2020 · 5 comments · Fixed by #32935
Labels
analyze bug 🐛 Programming errors, that need preferential fixing pid1
Milestone

Comments

@hexchain
Copy link
Contributor

systemd version the issue has been seen with

246 and current master (6d3702f)

Used distribution

Arch

Expected behaviour you didn't see

The timestamps listed in systemd-analyze should be reasonable.

Unexpected behaviour you saw

% ./systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @6.057s
└─multi-user.target @6.057s
  └─load-kdump-kernel.service @4.203s +1.853s
    └─basic.target @4.200s
      └─dbus-broker.service @4.175s +22ms
        └─dbus.socket @4.173s
          └─sysinit.target @4.172s
            └─systemd-update-utmp.service @4.161s +10ms
              └─auditd.service @4.088s +72ms
                └─systemd-tmpfiles-setup.service @4.029s +57ms
                  └─systemd-journal-flush.service @569ms +3.459s
                    └─systemd-remount-fs.service @240ms +326ms
                      └─systemd-fsck-root.service @584542y 2w 2d 20h 1min 48.998s +10ms
                        └─systemd-journald.socket
                          └─system.slice
                            └─-.slice

Notice that the systemd-fsck-root.service has a negative value as its timestamp, and that value gets parsed as an unsigned int64.

I've tried to set a breakpoint in list_dependencies_print to inspect some values:

(gdb) p *times
$3 = {has_data = true, name = 0x6030000090a0 "systemd-fsck-root.service", activating = 12528227, activated = 12538682, deactivated = 0, deactivating = 0, time = 10455}
(gdb) p *boot
$4 = {firmware_time = 15146461, loader_time = 1437747, kernel_time = 1600892403031723, kernel_done_time = 3796645, initrd_time = 3796645, userspace_time = 13081215, finish_time = 19406691, security_start_time = 13081683, security_finish_time = 13081689, generators_start_time = 13132766, generators_finish_time = 13220885, unitsload_start_time = 13220887, unitsload_finish_time = 13271106, initrd_security_start_time = 3796829, initrd_security_finish_time = 3796831, initrd_generators_start_time = 3928766, initrd_generators_finish_time = 3942867, initrd_unitsload_start_time = 3942868, initrd_unitsload_finish_time = 3944070, reverse_offset = 0}

Steps to reproduce the problem
Not sure, I haven't seen this before and don't believe it happens very often.

@poettering
Copy link
Member

hmm, this is weird... this system has an RTC? or is there a time jump at boot or so?

@hexchain
Copy link
Contributor Author

hexchain commented Oct 1, 2020

It is a normal consumer-grade x86 laptop and has an RTC. I'm not sure about the time jump though, how can I check?

@mlmatlock
Copy link

Seeing the same line as well, systemd-250 (Arch).

systemd-fsck-root.service @584542y 2w 2d 20h 1min 49.033s +12ms

System has an RTC, and I don't see a time discrepancy like I do when (re)booting my Raspberry Pi (which also has an added RTC module).

@DaniD3v
Copy link

DaniD3v commented Dec 1, 2023

systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @505ms
└─greetd.service @505ms
  └─systemd-user-sessions.service @501ms +3ms
    └─network.target @500ms
      └─NetworkManager.service @483ms +17ms
        └─dbus.service @472ms +8ms
          └─basic.target @472ms
            └─sockets.target @472ms
              └─dbus.socket @472ms
                └─sysinit.target @471ms
                  └─systemd-update-utmp.service @466ms +5ms
                    └─systemd-tmpfiles-setup.service @425ms +39ms
                      └─local-fs.target @424ms
                        └─efi.mount @378ms +45ms
                          └─systemd-fsck@dev-disk-by\x2dlabel-BOOT.service @314ms +59ms
                            └─dev-disk-by\x2dlabel-BOOT.device @584542y 2w 2d 20h 1min 48.572s +1.292s

same issue

@yuwata yuwata added the bug 🐛 Programming errors, that need preferential fixing label Dec 1, 2023
@mrc0mmand
Copy link
Member

mrc0mmand commented Dec 4, 2023

I can reproduce this quite easily locally with multiple units.

For example, with a separate /boot partition:

# systemd-analyze critical-chain boot.mount
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

boot.mount +241ms
└─systemd-fsck@dev-disk-by\x2duuid-b0a64db8\x2d2f5d\x2d4c95\x2d8bec\x2d6dfe2536ac5f.service @1.670s +52ms
  └─dev-disk-by\x2duuid-b0a64db8\x2d2f5d\x2d4c95\x2d8bec\x2d6dfe2536ac5f.device @584542y 2w 2d 20h 1min 48.818s +2.393s

The issue here is that the times->activating timestamp is one second before boot->userspace_time:

# systemctl show 'dev-disk-by\x2duuid-b0a64db8\x2d2f5d\x2d4c95\x2d8bec\x2d6dfe2536ac5f.device' | grep InactiveExitTimestamp
InactiveExitTimestamp=Fri 2023-11-24 19:19:46 UTC
InactiveExitTimestampMonotonic=2213745
# systemctl show | grep UserspaceTimestamp
UserspaceTimestamp=Fri 2023-11-24 19:19:47 UTC
UserspaceTimestampMonotonic=2946697

So here:

FORMAT_TIMESPAN(times->activating - boot->userspace_time, USEC_PER_MSEC),

we try to parse -732952 (2213745 - 2946697) as a timestamp, which is unsigned long, hence 18446744073708818664, and we get a garbage value:

# systemd-analyze timespan 18446744073708.818664
Original: 18446744073708.818664
      μs: 18446744073708818664
   Human: 584542y 2w 2d 20h 1min 48.818664s

Similarly, on my machine there's this one:

$ build/systemd-analyze critical-chain systemd-remount-fs.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

systemd-remount-fs.service +137ms
└─systemd-fsck-root.service @584542y 2w 2d 20h 1min 48.890s +45ms
  └─systemd-journald.socket
    └─system.slice
      └─-.slice

where, again, times->activating timestamp is one second before boot->userspace_time:

$ systemctl show systemd-fsck-root.service | grep InactiveExitTimestamp
InactiveExitTimestamp=Thu 2023-11-02 12:27:24 CET
InactiveExitTimestampMonotonic=15143158
$ systemctl show | grep UserspaceTimestamp
UserspaceTimestamp=Thu 2023-11-02 12:27:25 CET
UserspaceTimestampMonotonic=15804273

and we get another garbage value.

@mrc0mmand mrc0mmand added this to the v256 milestone Dec 4, 2023
keszybz added a commit to keszybz/systemd that referenced this issue May 20, 2024
We have the following timestamp status:

  $ systemctl show systemd-fsck-root.service | grep InactiveExitTimestamp
  InactiveExitTimestamp=Thu 2023-11-02 12:27:24 CET
  InactiveExitTimestampMonotonic=15143158

  $ systemctl show | grep UserspaceTimestamp
  UserspaceTimestamp=Thu 2023-11-02 12:27:25 CET
  UserspaceTimestampMonotonic=15804273

i.e. UserspaceTimestamp is before InactiveExit of systemd-fsck-root.service.
This is fine, but on display, we'd subtract those values and print a huge
negative value bogusly:

  $ build/systemd-analyze critical-chain systemd-remount-fs.service
  The time when unit became active or started is printed after the "@" character.
  The time the unit took to start is printed after the "+" character.

  systemd-remount-fs.service +137ms
  └─systemd-fsck-root.service @584542y 2w 2d 20h 1min 48.890s +45ms
    └─systemd-journald.socket
      └─system.slice
        └─-.slice

In fact, list_dependencies_print() already had a branch where the check that
'times->activating > boot->userspace_time', but it didn't cover all cases. So
make it cover both branches, and also change to '>=', since it's fine if
something happened with the same timestamp.

With the patch:

  $ build/systemd-analyze critical-chain systemd-remount-fs.service
  The time when unit became active or started is printed after the "@" character.
  The time the unit took to start is printed after the "+" character.

  systemd-remount-fs.service +42ms
  └─systemd-fsck-root.service
    └─systemd-journald.socket
      └─system.slice
        └─-.slice

Fixes systemd#17191.
bluca pushed a commit that referenced this issue May 20, 2024
We have the following timestamp status:

  $ systemctl show systemd-fsck-root.service | grep InactiveExitTimestamp
  InactiveExitTimestamp=Thu 2023-11-02 12:27:24 CET
  InactiveExitTimestampMonotonic=15143158

  $ systemctl show | grep UserspaceTimestamp
  UserspaceTimestamp=Thu 2023-11-02 12:27:25 CET
  UserspaceTimestampMonotonic=15804273

i.e. UserspaceTimestamp is before InactiveExit of systemd-fsck-root.service.
This is fine, but on display, we'd subtract those values and print a huge
negative value bogusly:

  $ build/systemd-analyze critical-chain systemd-remount-fs.service
  The time when unit became active or started is printed after the "@" character.
  The time the unit took to start is printed after the "+" character.

  systemd-remount-fs.service +137ms
  └─systemd-fsck-root.service @584542y 2w 2d 20h 1min 48.890s +45ms
    └─systemd-journald.socket
      └─system.slice
        └─-.slice

In fact, list_dependencies_print() already had a branch where the check that
'times->activating > boot->userspace_time', but it didn't cover all cases. So
make it cover both branches, and also change to '>=', since it's fine if
something happened with the same timestamp.

With the patch:

  $ build/systemd-analyze critical-chain systemd-remount-fs.service
  The time when unit became active or started is printed after the "@" character.
  The time the unit took to start is printed after the "+" character.

  systemd-remount-fs.service +42ms
  └─systemd-fsck-root.service
    └─systemd-journald.socket
      └─system.slice
        └─-.slice

Fixes #17191.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyze bug 🐛 Programming errors, that need preferential fixing pid1
Development

Successfully merging a pull request may close this issue.

6 participants