New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder systemd units in the rescue system and make sure syslog is started #3041
base: master
Are you sure you want to change the base?
Conversation
Services that need network to be configured can be made to be run after this target. This matches the usual systemd setup: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ Intended especially for 3rd party network backup solutions where restore needs some daemon running and communicating over network in the rescue system.
Make it a separate script and systemd unit. This allows to run services after sysinit (which initializes network) but before ReaR starts. Especially useful for network backup daemons that are needed to restore files during "rear recover".
otherwise basic.target and the services/sockets that it contains get never started (affect logging among others). This matches the systemd's default setup, see bootup(7)
It is not needed by any other systemd units, therefore it is unused. Duplicates sysinit.service.
The /etc/scripts/system-setup script in the rescue system is intended to start early and perform basic system initialization (according to /etc/inittab and /etc/init/rcS.conf in the pre-systemd configs). Make the sysinit.service that starts the script under systemd part of sysinit.target so that even with systemd it starts early. /etc/scripts/boot, executed by rear-boot-helper.service, is intended to run even earlier - make this ordering explicit.
The syslog socket used to be started early in the rescue system boot and for some reason this does not work well: logging to /dev/log does not trigger the start of rsyslogd and therefore the log data are never read (happens on RHEL 9 at least). Fix by ordering the syslog start after basic system initialization (sysinit.target), just like usual daemons. This seem to be the case without systemd as well, see /etc/inittab.
After the ReaR autostart on rescue system boot was separated from /etc/scripts/system-setup into its own script /etc/scripts/run-automatic-rear, this script needs to be executed on system boot. Adapt inittab and the Upstart configuration to start it. XXX neither the inittab (SysV init) nor the Upstart part was tested. Also, I don't understand why there are two locations for the Upstart files (/etc/init and /etc/event.d), with slightly different structure of services.
XXX I adapted the inittab (SysV init) and Upstart scripts to the split of /etc/scripts/system-setup, but neither part was tested. I don't understand why there are two locations for the Upstart files ( |
@pcahyna thanks for finding this and working to improve our startup scripts. I'm not entirely sure if we actually need to support SysV init and Upstart and more, so maybe we can just get rid of everything that is not systemd? Unfortunately we probably still have to support older versions of systemd. Can you please also verify this works on Ubuntu or Debian and SLES/SUSE? Such a change must IMHO be validated on more flavours than just one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please validate on more Linux distros
This will be a bit difficult, I don't have test environments for other distros. @jsmeix can you please test this change on SUSE distros? |
@pcahyna |
A simple "rear mkbackup" plus "rear recover" test Tomorrow I will test the same on SLES15. |
@jsmeix thanks for testing - have you seen errors when starting "D-Bus System Message Bus" as well? |
@pcahyna |
@jsmeix ok. I discovered that when I try to start this service manually on RHEL 8, it also fails, although for a different reason:
Makes me wonder what is the dbus service good for in the rescue system, when it was not used and fails when you start it. |
A simple "rear mkbackup" plus "rear recover" test |
Regarding D-Bus in the ReaR recovery system: In the ReaR recovery system In the ReaR recovery system
|
A test regarding D-Bus in the ReaR recovery system: On the original system I removed
and did "rear -D mkrescue" which worked without errors In the ReaR recovery system on SLES15 SP3:
and the rebooted recreated system also works well for me. So at least for my simple test BUT: |
@jsmeix thank you for the tests. During your tests, did rsyslog start? Are there some log messages in /var/log/messages (or whatever is the rsyslog default destination)? Regarding D-Bus. You show that the dbus.socket is active, but dbus.service is not. Probably it means that nothing has tried to use the D-Bus socket, therefore systemd had no reason to start the D-Bus service (unlike in RHEL, where something attempts to use D-Bus during recovery, but apparently it is not critical, because D-Bus failure does not prevent recovery) and thus we don't know whether it would have worked or not. |
@pcahyna @rear/contributors |
@jsmeix thanks! have a nice weekend as well! When looking at syslog in the rescue system, you can experiment by sending a test message: |
usr/share/rear/skel/default/etc/scripts/system-setup-functions.sh
Outdated
Show resolved
Hide resolved
On SLES12 SP5 on same VMs as above in On the original system:
In the ReaR recovery system on the replacement VM:
|
On SLES15 SP3 on same VMs as above in On the original system:
In the ReaR recovery system on the replacement VM:
|
Regarding D-Bus in the ReaR recovery system On SLES12 SP5:
On SLES15 SP3:
|
Thank you so much @jsmeix for the SLES investigation. The situation on SLES looks pretty identical to RHEL, which is encouraging. |
@jsmeix by the way, is there still some SLES version that is interesting for you to support and does not use systemd (uses Upstart or SysV init)? |
@pcahyna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple "rear mkbackup" plus "rear recover" test
worked for me on KVM QEMU virtual machines
with OUTPUT=ISO and BACKUP=NETFS
on SLES15 SP3 and SLES12 SP5 with DHCP networking.
@pcahyna |
I would like to test on Debian or Ubuntu, but I got distracted by another things. Let's merge it in a week if I don't get around to set up a Debian or Ubuntu test environment. |
I have set up a Debian Bookworm test environment and the code does not work very well. I need to debug this more. (It is of course possible that even the current master branch code has issues on Debian Bookworm, unless someone tested that recently). |
I just noticed that on Ubuntu 22.04 Will this fix also solve that problem? I'm actually wondering why we can't use only It seems to me that having |
Stale pull request message |
Stale pull request message |
Pull Request Details:
Type: Bug Fix
Impact: Normal
Reference to related issue (URL):
How was this pull request tested?
Boot of the rescue environment on RHEL 9, and logging a large number of log messages via
for i in $(seq 1 1000); do logger foo$i; done
Brief description of the changes in this pull request:
This fixes a problem with /dev/log in the rescue system: on RHEL 9 no process was reading it, as rsyslog was not being started. If during recovery a lot of log messages got generated (this happened while rerunning GRUB when there are lots of file systems, as grub-probe is very noisy), the log socket got filled up and recovery froze.
Separation of ReaR startup from sysinit.service allows it to run late in the bootup process, while sysinit.service is to be run early, and thus run network daemons after sysinit.service but before ReaR. Especially useful for network backup daemons that are needed to restore files during "rear recover".
As a side effect, I am now seeing error messages from systemd during "rear recover":
systemctl status dbus.service
says:This is because dbus-daemon and related utilities (dbus-uuidgen) are not present in the rescue image. dbus.service was broken even before this change, but since nothing depended on sockets.target, dbus.socket did not get activated and the problem did not appear. This problem does not seem to have adverse effect, so I am leaving it for later.