New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zdtm: Distinguish between fail and crash of dump #2376
base: criu-dev
Are you sure you want to change the base?
zdtm: Distinguish between fail and crash of dump #2376
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2376 +/- ##
============================================
- Coverage 70.73% 70.57% -0.16%
============================================
Files 136 135 -1
Lines 32940 33449 +509
============================================
+ Hits 23299 23606 +307
- Misses 9641 9843 +202 ☔ View full report in Codecov by Sentry. |
test/zdtm.py
Outdated
def exit_signal(ret): | ||
if ret == -2: | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified to:
def exit_signal(ret):
return ret == -2
Why do we use -2
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should've written that in the first place. I will fix it. Thank You :)
If I understand your question correctly,
The run
static method returns -1
if criu dump
fails.
I changed the run
method to return -2
if rpc sends an invalid response (CRIU crashed?).
The exit_signal
function checks for a crash. So, if ret == -2
then it means CRIU crashed.
62ae140
to
4bfaf6e
Compare
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with |
Adds a exit_signal static method to criu_cli, criu_config and criu_rpc used to detect a crash. Fixes: checkpoint-restore#350 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
4bfaf6e
to
2c9e91c
Compare
Hello @Snorch, |
Looks correct. But from zdtm.py point of view it may be not so simple, see how __criu_act retries action without fault, we likely don't want our fault to be retry-able by zdtm, so maybe just make it's id >=128 ?
Yeh, it's a good question. Let's just use one random test with |
1400985
to
2590899
Compare
I don't think giving the fault injection an id >= 128 is what we want.
I have added |
I may misunderstand something, correct me if I'm wrong, but isn't a test fail exactly the behavior we expect in case of criu crash? In other words: if criu fails to dump/restore
|
2590899
to
c787b51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Sorry, the misunderstanding was on my part. You are correct; the tests previously passed as they were re-run without the fault injection. Thank you for all your help @Snorch :) |
Hm, test still does not pass with This seem to do the right trick:
Also maybe someone else has better ideas? |
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
c787b51
to
d471052
Compare
This PR distinguishes
criu dump
crash fromcriu dump
fail in thezdtm.py
script. Now,zdtm
reportscriu dump
crash as a test failure.Very similiar to the work done in this PR (#2140)
But, to detect a rpc crash we see that if the response from the RPC is invalid, then we assume
the RPC crashed.
For detecting the crash for the CLI we can check the
Popen.returncode
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode
To simulate the crash, I added a call to
abort()
beforeline 2104 criu/cr-dump.c
Fixes: #350