-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FW/Unix: fix handling of signals in wait_for_children() #430
FW/Unix: fix handling of signals in wait_for_children() #430
Conversation
83044d6
to
fd92fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 'quick fix' to handle 'immediate' signals. Fix is valid.
As there's no SIGINT handling ("overall"), more discussion/work is required (best would be another tasks).
I verified this fix on my end as well. |
fd92fe3
to
f7d7d1f
Compare
bats/sanity-check/20-selftests.bats
Outdated
kill -$signal $pid | ||
if wait $!; then | ||
# Confirm it was the correct signal | ||
(( $? = 128 | $(kill -l $signal))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this needs to be ==
5dae0d5
to
0ad43c0
Compare
I see a problem when we run opendcdiag with It looks like to reproduce it we have to wait until --service ends running its batch of the tests and goes to sleep. We might have found this issue #157 |
bats/sanity-check/20-selftests.bats
Outdated
if wait $!; then | ||
# Confirm it was the correct signal | ||
(( $? == (128 | $(kill -l $signal)))) | ||
else | ||
echo >&2 'Test did not exit!' | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this valid?
wait
waits forever (while process is active). It returns the status of background command, in our case it is expected that "signal" (128 + sig) is returned.
I don't understand :1548 purpose, is -e
assumed to fail the script?
I rather expect something like:
wait $pid
stat=$?
if [ $stat != $((128 | $(kill -l $signal))) ]; then
echo "Process ${pid} exited with ${stat}" 1>&2
exit 1
fi
Is wait with timeout required here, or does .bats handling cover timeouts with upper layers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indeed looks wrong. We're probably running the else
code all the time, but because this is passing, I didn't notice it. Weirdly, I can't debug this because if I add a false
after this if
so it will fail, bats fails completely:
1..3
# bats warning: Executed 0 instead of expected 3 tests
Changing to fd 3 does show that we're running on the wrong side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bats-core/bats-core#364 - I need to remove the trap
The rewrite in commit 37cb3e5 for wait_for_children() accidentally broke the handling of interruptions by signals on Unix systems, which exists to ensure all the processes get interrupted if the parent process is. Handling of SIGHUP and SIGTERM was easy, since single_wait() was already returning the correct signal number, but wait_for_all_children() wasn't acting on it. Handling of SIGINT required remembering we had been interrupted once. Fixes opendcdiag#429. [ChangeLog][Framework] Fixed a regression in handling of Ctrl+C or Unix signals sent to the main process, which caused the tool to attempt to continue running instead of exiting immediately. Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
They're used now (have been since the `wait_for_children()` rewrite in commit 37cb3e5). Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
0ad43c0
to
f15bf3a
Compare
This is related to signal handling in waiting. Fixed with #439 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The rewrite in commit 37cb3e5 for wait_for_children() accidentally broke the handling of interruptions by signals on Unix systems, which exists to ensure all the processes get interrupted if the parent process is.
Handling of SIGHUP and SIGTERM was easy, since single_wait() was already returning the correct signal number, but wait_for_all_children() wasn't acting on it. Handling of SIGINT required remembering we had been interrupted once.
Fixes #429.