Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional support for sending systemd readiness notifications #3158

Merged
merged 2 commits into from May 13, 2024

Conversation

DaanDeMeyer
Copy link
Contributor

@DaanDeMeyer DaanDeMeyer commented May 8, 2024

When ENABLE_SYSTEMD is enabled, we'll link against libsystemd and send readiness notifications using sd_notify() as documented in https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html.

These will only be sent if the NOTIFY_$SOCKET environment variable is set to a unix socket where systemd expects to receive readiness notifications.

This allows running bpftrace in a systemd service which will only be considered started once bpftrace has finished allocation its probes. This allows starting bpftrace from scripts and ensuring that its probes are registered before continuing.

For example, currently one has to do something like this:

"""
bpftrace <script> &
PID=$!
sleep 2 # How many seconds should we sleep???

kill $PID
"""

With the problem being that we never know how long we should sleep. Either we don't sleep long enough and miss messages, or we sleep too long and waste time doing nothing useful.

With this PR, we can do the following instead:

"""
systemd-run --unit=bpftrace --service-type=notify bpftrace <script>
systemctl stop bpftrace
"""

The systemd-run command will only finish once bpftrace has sent the READY=1 notification, which is only send after probes have been registered, thus guaranteeing that we wait just long enough for the probes to be registered but not any longer.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

src/bpftrace.cpp Outdated Show resolved Hide resolved
src/bpftrace.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. The only thing I'm thinking about is testing this functionality so we don't accidentally break it some day e.g. creating a build in the CI pipeline with this enabled and checking if we get the notify. Not sure how hard that is though.

Also, is it bad to always attempt to dynamically link against this systemd lib always, and only fire the notify is NOTIFY_$SOCKET is also populated?

@DaanDeMeyer
Copy link
Contributor Author

Well I assume bpftrace might be used on distributions such as Alpine Linux which do not package libsystemd, and we don't want to exclude them from being able to use bpftrace.

@DaanDeMeyer
Copy link
Contributor Author

@jordalgo I took a quick look but I have no clue where exactly I'd write a test. What the test needs to do is create a unix socket and bind it, set ``$NOTIFY_SOCKET` to the socket address, run any bpftrace program with probes and read from the socket to see if the messages were sent.

@jordalgo
Copy link
Contributor

Well I assume bpftrace might be used on distributions such as Alpine Linux which do not package libsystemd, and we don't want to exclude them from being able to use bpftrace.

I'm still looking around but maybe we could use dlopen to dynamically load this library and fail gracefully if it doesn't exist on the distro. Then we don't have to worry about setting this flag for the different distro builds (although maybe that's not an issue).

@jordalgo
Copy link
Contributor

I took a quick look but I have no clue where exactly I'd write a test. What the test needs to do is create a unix socket and bind it, set ``$NOTIFY_SOCKET` to the socket address, run any bpftrace program with probes and read from the socket to see if the messages were sent.

Yeah maybe adding a runtime test for this is too complicated (at least at this point) and maybe the cost of maintaining the test is not worth preserving this functionality. I'll see if anyone else chimes in.

@DaanDeMeyer
Copy link
Contributor Author

Well I assume bpftrace might be used on distributions such as Alpine Linux which do not package libsystemd, and we don't want to exclude them from being able to use bpftrace.

I'm still looking around but maybe we could use dlopen to dynamically load this library and fail gracefully if it doesn't exist on the distro. Then we don't have to worry about setting this flag for the different distro builds (although maybe that's not an issue).

dlopen() is definitely possible, but I don't think distributions will mind the extra flag. Lots of applications do it this way, where there's a build option to enable the systemd specific bits. Even if dlopen() is used, distributions would still have to list libsystemd as a recommended or required dependency of bpftrace to make sure it is pulled in when bpftrace is installed (though in practice libsystemd will always be installed)

@jordalgo
Copy link
Contributor

distributions would still have to list libsystemd as a recommended or required dependency of bpftrace to make sure it is pulled in when bpftrace is installed (though in practice libsystemd will always be installed)

Ah right, good point.

Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change. I'll wait a few days to see if anyone else has feedback before landing.

Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand use-case for this please? Why would bpftrace be run in a systemd service?

src/bpftrace.cpp Show resolved Hide resolved
@DaanDeMeyer
Copy link
Contributor Author

DaanDeMeyer commented May 10, 2024

Could you help me understand use-case for this please? Why would bpftrace be run in a systemd service?

It's not about running bpftrace as a systemd service, it's about using systemd's notification protocol as a synchronization point to run bpftrace as a background process.

The commit message already contains this but I'll repeat it here:

To run bpftrace in the background from a script for a few commands, currently one has to do the following:

bpftrace <script> &
PID=$!
sleep 2 # How many seconds should we sleep???
<do stuff that needs to be traced>
kill $PID

Specifically, there is no synchronization point that one can wait for where we're 100% sure that bpftrace has attached its probes. To solve this, bpftrace needs to learn some way to notify the caller that it has attached its probes and is waiting for events. bpftrace could design its own protocol for doing this, but it so happens that systemd already has one, which is very useful because we can then also use systemd to run bpftrace itself as a background process as follows:

systemd-run --unit=bpftrace --service-type=notify bpftrace <script>
<do stuff that needs to be traced>
systemctl stop bpftrace

With this commit, systemd-run will not finish until bpftrace has attached its probes. This means we do not have to sleep after attaching probes because we can be 100% sure that bpftrace has attached its probes after systemd-run finished. systemd will also manage the bpftrace process so we can later use systemctl stop to stop bpftrace.

Does this clarify things?

@ajor
Copy link
Member

ajor commented May 10, 2024

Thanks for the explanation - that does clear up how it's intended to be used.

I'm still not sure what the specific purpose is though (not that I'm saying there isn't one!). Who needs this feature and why?

@DaanDeMeyer
Copy link
Contributor Author

DaanDeMeyer commented May 10, 2024

Thanks for the explanation - that does clear up how it's intended to be used.

I'm still not sure what the specific purpose is though (not that I'm saying there isn't one!). Who needs this feature and why?

I need the feature to debug systemd's integration tests with bpftrace. Instead of writing the first snippet, I want to write the second snippet. (and avoid the race condition present in the first snippet)

@ajor
Copy link
Member

ajor commented May 10, 2024

Does the -c command line option not work for that purpose?

e.g.

# bpftrace my_script.bt -c ./test/run-integration-tests.sh

@jordalgo
Copy link
Contributor

Does the -c command line option not work for that purpose?

Totally forgot about this option. Can't wait for the docs website :) We should definitely add an FAQ or a How-To page

@DaanDeMeyer
Copy link
Contributor Author

Does the -c command line option not work for that purpose?

e.g.

# bpftrace my_script.bt -c ./test/run-integration-tests.sh

Does not work because I can not add it in the middle of an existing script. I don't want to run the entire script with bpftrace tracing, only a few commands need the tracing enabled. Having to move stuff around to accommodate bpftrace -c is much more work than just adding the snippet from above around the code I want to trace.

@DaanDeMeyer
Copy link
Contributor Author

Aside from the use case of using bpftrace in a script, If I want to run bpftrace during early boot to debug something then I do actually want to run it as a service so I can make sure it is started before the services that I want to trace, in which case I have the exact same problem that I don't want the services that need tracing to start before bpftrace has attached its probes. Again, the solution is the systemd notify integration so that I can drop in a service as follows:

bpftrace.service

[Unit]
Before=service-i-want-to-trace.service

[Service]
Type=notify
ExecStart=bpftrace <script>

Without the notify integration, systemd would immediately start the service I want to trace after systemd has execve()'d the bpftrace process, so you get a race condition where bpftrace might not attach its probes before the service that needs to be traced is started. The notify integration means that the service I want to trace is only started after bpftrace has attached its probes.

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early boot use case sounds quite interesting. this change seems low maintenance cost + optional to enable so sounds reasonable to me.

maybe call out this feature in the man page? an example systemd-run one-liner or mentioning early boot debugging would be good. it'd at least help us not forget the use case

CMakeLists.txt Outdated Show resolved Hide resolved
@DaanDeMeyer DaanDeMeyer force-pushed the systemd branch 2 times, most recently from bbc4b80 to 5068236 Compare May 11, 2024 12:40
Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, could you please update src/bpffeature.cpp with an entry for systemd support?

and a CHANGELOG.md entry

man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a reasonable feature and looks good to me. Just need to move the feature detection code to a different file and I think we're good to go

src/bpffeature.cpp Outdated Show resolved Hide resolved
When ENABLE_SYSTEMD is enabled, we'll link against libsystemd and
send readiness notifications using sd_notify() as documented in
https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html.

These will only be sent if the NOTIFY_$SOCKET environment variable
is set to a unix socket where systemd expects to receive readiness
notifications.

This allows running bpftrace in a systemd service which will only
be considered started once bpftrace has finished allocation its probes.
This allows starting bpftrace from scripts and ensuring that its probes
are registered before continuing.

For example, currently one has to do something like this:

"""
bpftrace <script> &
PID=$!
sleep 2 # How many seconds should we sleep???
<stuff that needs tracing>
kill $PID
"""

With the problem being that we never know how long we should sleep.
Either we don't sleep long enough and miss messages, or we sleep too
long and waste time doing nothing useful.

With this PR, we can do the following instead:

"""
systemd-run --unit=bpftrace --service-type=notify bpftrace <script>
<stuff that needs tracing>
systemctl stop bpftrace
"""

The systemd-run command will only finish once bpftrace has sent the
READY=1 notification, which is only send after probes have been registered,
thus guaranteeing that we wait just long enough for the probes to be
registered but not any longer.
@ajor ajor merged commit 5d861ee into bpftrace:master May 13, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants