Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle CTRL+C and CTR+Break events on Windows #1342

Open
wants to merge 4 commits into
base: rolling
Choose a base branch
from

Conversation

MichaelOrlov
Copy link
Contributor

@MichaelOrlov MichaelOrlov commented May 22, 2023

Note:
According to the https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170

SIGINT is not supported for any Win32 application. When a CTRL+C interrupt occurs, Win32 operating systems generate a new thread to specifically handle that interrupt.

The SIGILL and SIGTERM signals aren't generated under Windows. They're included for ANSI compatibility. Therefore, you can set signal handlers for these signals by using signal, and you can also explicitly generate these signals by calling raise.

The CTRL_CLOSE_EVENT is an analog of the SIGTERM from POSIX. Windows sends CTRL_CLOSE_EVENT
to all processes attached to a console when the user closes the console (either by clicking Close on the console window's window menu, or by clicking the End Task button command from Task Manager).
Although according to the https://learn.microsoft.com/en-us/windows/console/generateconsolectrlevent the
GenerateConsoleCtrlEvent(..) doesn't support sending CTRL_CLOSE_EVENT. There is no way to directly send CTRL_CLOSE_EVENT to the process in the same console application.
Therefore, added SIGTERM to the unsupported events in the stop_execution(process_handle, signum) API.

@MichaelOrlov MichaelOrlov force-pushed the morlov/properly_handle_ctrl-c_signal_on_windows branch from 2399f9a to 28c7bb2 Compare May 22, 2023 01:03
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is something that we should put into rclcpp rather than here. That is, it seems to me that all ROS 2 binaries would benefit from this kind of fix.

@MichaelOrlov @mjcarroll what do you think?

@mjcarroll
Copy link
Member

I wonder if this is something that we should put into rclcpp rather than here. That is, it seems to me that all ROS 2 binaries would benefit from this kind of fix.

I think it makes sense to put a version of this in ROS 2, but I think @MichaelOrlov wants to get this fix in for Iron's version of rosbag2, so it seems reasonable to do it here as well for now. Maybe in the longer term if we get something landed in rolling, he can deprecate it here to reduce redundancy?

@MichaelOrlov
Copy link
Contributor Author

@clalancette, I agree with @mjcarroll and yes I would like to keep it in rosbag2 to be able to backport it to the iron branch. And will get rid of redundancy when it will be available in the rclcpp layer.

@emersonknapp
Copy link
Collaborator

And will get rid of redundancy when it will be available in the rclcpp layer.

@MichaelOrlov would you be willing to make the patch to add the helpers to rclcpp?

@MichaelOrlov
Copy link
Contributor Author

MichaelOrlov commented May 28, 2023

@MichaelOrlov would you be willing to make the patch to add the helpers to rclcpp?

@emersonknapp Yes. I can try to make the changes in rcpcpp layer but a bit later on, it might need to make changes on the rclpy side as well.
I think after merging this PR.

@MichaelOrlov MichaelOrlov force-pushed the morlov/properly_handle_ctrl-c_signal_on_windows branch from 28c7bb2 to c30a2d7 Compare June 3, 2023 20:07
@MichaelOrlov MichaelOrlov changed the base branch from morlov/gracefully_handle_sigint_in_recorder to rolling June 3, 2023 20:08
@MichaelOrlov MichaelOrlov force-pushed the morlov/properly_handle_ctrl-c_signal_on_windows branch from c30a2d7 to abff40d Compare June 3, 2023 20:17
@MichaelOrlov MichaelOrlov marked this pull request as ready for review June 3, 2023 20:19
@MichaelOrlov MichaelOrlov requested a review from a team as a code owner June 3, 2023 20:19
@MichaelOrlov MichaelOrlov requested review from gbiggs, hidmic and emersonknapp and removed request for a team and gbiggs June 3, 2023 20:19
@MichaelOrlov
Copy link
Contributor Author

@MichaelOrlov
Copy link
Contributor Author

Gist: https://gist.githubusercontent.com/MichaelOrlov/55adb4c4d5dac5435adace5c095d45b2/raw/09cc03757951ddc3c5ec8885a4d766a84f26207f/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_py rosbag2_test_common ros2bag rosbag2_transport rosbag2_tests
TEST args: --packages-above rosbag2_py rosbag2_test_common ros2bag rosbag2_transport rosbag2_tests
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12152

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Contributor Author

Returning back to draft due to some unexpected test failures on CI which are not reproduced locally and need to be investigated.

@MichaelOrlov
Copy link
Contributor Author

Re-run CI after rebase and adjusting expectations in flaky ros2bag record tests
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12197

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Contributor Author

It seems that CI failure on windows in rosbag2_tests.TestPlayEndToEnd/PlayEndToEndTestFixture. relates to the way how we are running tests on Windows CI.
It fails with the error messages

C:\ci\ws\install\include\rosbag2_test_common\rosbag2_test_common\process_execution_helpers_windows.hpp(177): error: Value of: GenerateConsoleCtrlEvent(CTRL_C_EVENT, handle.process_info.dwProcessId)
  Actual: false
Expected: true
Error code:1. Error message: Incorrect function.

The error code is very weird and I can't reproduce this failure on my local Windows + ROS2 Rolling setup.
However, I found a discussion on the docker-related forum Can't call GenerateConsoleCtrlEvent system call in Windows containers](docker/for-win#3173) which is describes exactly the same problem with the GenerateConsoleCtrlEvent system call and its Error code:1. Error message: Incorrect function.
According to one of the reply

@lox have you tried Windows Server 2019? It is fixed in the ltsc2019 image.
I can confirm this works in ltsc2019 images, closing!

It seems this issue has been fixed on the Windows Server 2019
@mjcarroll @clalancette What version of the Windows and docker containers are we using for running our Windows CI?

@MichaelOrlov MichaelOrlov marked this pull request as ready for review June 12, 2023 07:49
@MichaelOrlov
Copy link
Contributor Author

Re-run Windows CI job after adding ctrl_c_event_can_be_send_and_received sanity test

  • Windows Build Status

@MichaelOrlov
Copy link
Contributor Author

Hi @claraberendsen,
You mentioned to poing you if the issue with Windows not being able to send the GenerateConsoleCtrlEvent will reproduce after re-run on the fresh Windows build.
It is still failing https://ci.ros2.org/job/ci_windows/19807/
I have added the "minimum valuable reproducer" ctrl_c_event_can_be_send_and_received test to show that issue reproduces even with a simple application with an endless loop without dependencies from other packages. Please refer to my latest commit Add basic ctrl_c_event_can_be_send_and_received test for details.

The tests which are failing on the CI are passes on my local Windows setup without docker container.
Here is the link to a similar issue related to Windows under the docker Can't call GenerateConsoleCtrlEvent system call in Windows containers](docker/for-win#3173) use case.

It seems someone needs to contact Microsoft Support for this issue. Since I have no more ideas on how to fix this issue.
Could you please handle this issue or forward it to someone who can handle it?

@claraberendsen
Copy link

claraberendsen commented Jul 4, 2023

It seems someone needs to contact Microsoft Support for this issue. Since I have no more ideas on how to fix this issue. Could you please handle this issue or forward it to someone who can handle it?

@MichaelOrlov Thanks for pinging me I'm going to take a look at this and see what I find and if needed be forward it.

@claraberendsen
Copy link

@MichaelOrlov I can confirm that we are using the new WS2022 image. In the logs of the build you referenced above you can see the change on the url for the windows image to point to /windows/server which is a new built image for WS2022 (See as reference to the previous 2019 image).

@MichaelOrlov
Copy link
Contributor Author

@claraberendsen As regards

I can confirm that we are using the new WS2022 image

I know. I have seen it from logs. The problem is that issue reproduces even on the new WS2022 image.
This is why I mentioned that

It seems someone needs to contact Microsoft Support for this issue. Since I have no more ideas on how to fix this issue.

@claraberendsen
Copy link

@MichaelOrlov We do not currently have the bandwidth to investigate this further. I see you pointed that in your local machine you are not seeing this error only when running the CI because we utilize docker. I don't have clarity that the issue comes from using docker or the docker image of our setup. To verify that the issue comes from it being ran inside docker further testing should be conducted by running this on a WS2019 image and a WS2022 image. I'm pinging @clalancette to see if he can help unblock this.

@MichaelOrlov
Copy link
Contributor Author

@claraberendsen I don't see a rationale in trying to run the test on the different WS2019 and a WS2022 images under the docker.
The fact that on our latest WS2022 image in the docker container, any call to GenerateConsoleCtrlEvent returns an Incorrect function error is enough.
IMO the next step would be to reach out to Microsoft and say: "We are using your licensed latest WS2022 image under the docker container version ... and any call to the GenerateConsoleCtrlEvent returns an Incorrect function error". This behavior doesn't correspond to your spec written here https://learn.microsoft.com/en-us/windows/console/generateconsolectrlevent"

@clalancette
Copy link
Contributor

I'm pinging @clalancette to see if he can help unblock this.

I don't have any particular knowledge about what is going on with Windows. Maybe @ooeygui can help get us in touch with the right people?

@ooeygui
Copy link

ooeygui commented Jul 20, 2023

@clalancette I'm looking into it.

@clalancette
Copy link
Contributor

@clalancette I'm looking into it.

Thank you, much appreciated!

@clalancette
Copy link
Contributor

@ooeygui Friendly ping. Any thoughts here?

@ooeygui
Copy link

ooeygui commented Aug 28, 2023

@clalancette I've been trying to track this down internally - it looks like there was a fix, but it was backed out due to compatibility issues. I am trying to find an alternative method, but there does not appear to be a supported graceful solution.

@ooeygui
Copy link

ooeygui commented Sep 18, 2023

@clalancette I found the owner of the API. It looks like my statement about a reverted fix was incorrect, that this should work.

I'm working on a small repro to root cause.

@ooeygui
Copy link

ooeygui commented Sep 21, 2023

Both the owner of the Windows API and I have been attempting to reproduce this, but haven't been able to.
Does this still repro for you?

@MichaelOrlov
Copy link
Contributor Author

@ooeygui Originally this issue was not reproducible on my local setup. But as you can see CI jobs fail.

@ooeygui
Copy link

ooeygui commented Sep 21, 2023

Thanks @MichaelOrlov. @clalancette Do you know where I can find the windows container spec? I'd like to try building it locally.

@nuclearsandwich-ai
Copy link

Do you know where I can find the windows container spec? I'd like to try building it locally.

Our Windows containers are currently running on Windows Server 2022 hosts running in AWS EC2. We're using the Docker engine for Windows in the non-virtualized mode so we can take advantage of shared mounts between the container and host system (this forces our containers and the host Windows version to be the same, unlike the virtualized container engine which does not support shared filesystem mounts between host and container) on ci.ros2.org you can see our dockerfile template here: https://github.com/ros2/ci/blob/master/windows_docker_resources/Dockerfile and you can inspect the logs to see the build arguments that a given job is run with. Within the container most of the heavy lifting for setup and configuration is managed by this chef cookbook https://github.com/ros-infrastructure/ros2-cookbooks/tree/latest/cookbooks/ros2_windows

@MichaelOrlov MichaelOrlov force-pushed the morlov/properly_handle_ctrl-c_signal_on_windows branch from 0bcf904 to 250c228 Compare December 27, 2023 23:25
@MichaelOrlov
Copy link
Contributor Author

@ooeygui @nuclearsandwich-ai Any updates on this issue?

cc: @clalancette

@ooeygui
Copy link

ooeygui commented Jan 4, 2024

Hi @MichaelOrlov,
I spoke with the owner of the API which exhibits this issue. We have not been able to repduce this issue, so cannot debug it. Discussing the runtime environment, we believe the root cause is the version difference between the container and the host (Windows containers behave differently than linux containers; which requires that they be the same version of the OS)

In a future version of Windows, the API has been refactored so it should not exhibit this problem, but this depends on migrating the host and clients CI to a more current version of Windows.

I'd recommend disabling the test on Windows until this migration takes place.

@MichaelOrlov
Copy link
Contributor Author

@ooeygui Do we have an issue to track when the migration of the host and clients CI to a more current version of Windows is going to happen?
If not could you please create it?

I need to refer to something in the code when will be disabling tests for Windows.

BTW can we disable tests for CI and keep them running by default for the local setup?

@MichaelOrlov
Copy link
Contributor Author

@ooeygui @nuclearsandwich-ai @clalancette May I have an issue number to track when the migration of the host and clients CI to a more current version of Windows is going to happen? As from discussion above ^^^.
Long story short:
On our latest WS2022 image in the docker container, any call to GenerateConsoleCtrlEvent returns an Incorrect function error.
The actual behaviour doesn't correspond to the official spec found here https://learn.microsoft.com/en-us/windows/console/generateconsolectrlevent. It was promised here #1342 (comment) that the issue was fixed in the latter versions of Windows.

If we will not have a follow-up issue for infrastructure - the fix will never happen.

Also please note that disabling tests will defeat the purpose and fixes in this PR.

- Add handler for the native Windows CTRL_C_EVENT, CTRL_BREAK_EVENT and
CTRL_CLOSE_EVENT in rosbag2 recorder and player.
- Map SIGINT signal to the native Windows CTRL_C_EVENT in the
 stop_execution(handle, signum) version for Windows. To be able
 correctly use start and stop execution routines from unit tests.
- Enable integration tests which was disabled for Windows due to the
incorrect sending and handling of the SIGINT event.
- Got rid of the `finalize_metadata_kludge(..)` helper function.

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
- Replace waiting for the `/rosout` topic on waiting for the rosbag2
internal `/events/write_split`. Rationale:
The `/rosout` topic may not be enabled on the CI.
- Replace expectation for the `Listening for topics...` on
`Recording...` which is appearing at the very end of the
Recorder::record(). Rationale:
It was possible a race condition when test was
sending the SIGINT while we haven't yet finished call for
Recorder::record().

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
@MichaelOrlov MichaelOrlov force-pushed the morlov/properly_handle_ctrl-c_signal_on_windows branch from 250c228 to cb7b410 Compare January 30, 2024 23:55
@claraberendsen
Copy link

@MichaelOrlov @ooeygui do we know from what version this should be resolved? I cannot assess or give timing of a potential migration without this information.

@MichaelOrlov
Copy link
Contributor Author

@claraberendsen I don't know. It seems only @ooeygui knows.
@ooeygui Could you please elaborate on what Windows version this issue should have already been resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants