Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yath-runner retry.t did not respond to SIGTERM, sending SIGKILL... #207

Open
rkleemann opened this issue Dec 3, 2020 · 3 comments
Open

Comments

@rkleemann
Copy link

When trying to install Test2::Harness in a docker container, I get an inconsistent failure. Sometimes it completes successfully (in about 8 minutes), and sometimes it fails after a half-hour of trying. I recently tried running the install as cpanm --verbose Test2::Harness, and it is appearing to get stuck in a loop, repeatedly saying for the past hour-plus:

(INTERNAL) 2837 yath-runner /root/.cpanm/work/1607030892.1/Test2-Harness-1.000042/t/integration/retry.t did not respond to SIGTERM, sending SIGKILL to 3096...

I haven't tried with a minimal Dockerfile, but I'm using centos:latest as the base, installing gcc, git, and perl-App-cpanminus via yum, and then installing a bunch of modules via cpanm, of which one is Test2::Harness.

For the record, the test before this, failure_cases.t, completes successfully.

@rkleemann
Copy link
Author

In an attempt at playing with some variables, I changed the FROM line in the Dockerfile from centos:latest to perl, and Test2::Harness 1.000042 completed its tests and installed. Given that this is a perfectly reasonable workaround for the issue, I think the bug can be closed, or it can be left open in order to investigate the issue further.

@exodist
Copy link
Member

exodist commented Dec 16, 2020

I will leave the bug open for a while to see if anyone else has issues. If nothing else gets reported I may just close it. I am refactoring the code that would be responsible for killing stalled processes, so this will probably be fixed by that anyway.

@charsbar
Copy link

charsbar commented May 7, 2023

I (and most probably rjbs) encountered this issue while testing PAUSE Web with yath-runner under GitHub Action.

See also https://github.com/andk/pause/pull/426/files#diff-190e6442506b7204e263090f96dce1bd37272aa75d50dad4eeda4f5eca86eaa9R18-R22

Excerpt from a log ( https://github.com/charsbar/pause/actions/runs/4894552162/jobs/8738930578 )

( TIMEOUT)  job 12    Sometimes tests will fork and then return. On supported systems Test2::Harness
( TIMEOUT)  job 12    will start all tests with their own process group and will wait for the entire
( TIMEOUT)  job 12    group to exit before considering the test done. In these cases Test2::Harness
( TIMEOUT)  job 12    will poll for output from the process group at a configurable interval, if no
( TIMEOUT)  job 12    output is produced between intervals the process group will be forcefully
( TIMEOUT)  job 12    killed. See the '--post-exit-timeout' option to configure the interval.
< TIMEOUT>  job 12    A timeout (post-exit) has occured (after ?? seconds), job was forcefully killed
(INTERNAL)     14083 yath-runner /__w/pause/pause/t/pause_2017/action/add_user.t did not respond to SIGTERM, sending SIGKILL to 15093...

SIGTERM comes from Test::mysqld ( https://metacpan.org/dist/Test-mysqld/source/lib/Test/mysqld.pm#L150-166 )

Hope this helps a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants