Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump: Don't unfreeze tasks on dump failure with --no-resume-on-error. #2215

Draft
wants to merge 2 commits into
base: criu-dev
Choose a base branch
from

Conversation

osctobe
Copy link
Contributor

@osctobe osctobe commented Jun 23, 2023

Make it possible to kill or leave stopped tasks if a dump failed after stopping the tree.

@osctobe osctobe force-pushed the leave-stopped branch 3 times, most recently from 6c2c50d to ae821a3 Compare June 24, 2023 01:26
@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (cda1c5c) 70.51% compared to head (55c284d) 70.51%.

❗ Current head 55c284d differs from pull request most recent head 6f08f8f. Consider uploading reports for the commit 6f08f8f to get more accurate results

Files Patch % Lines
criu/cr-service.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           criu-dev    #2215   +/-   ##
=========================================
  Coverage     70.51%   70.51%           
=========================================
  Files           133      133           
  Lines         33534    33539    +5     
=========================================
+ Hits          23646    23650    +4     
- Misses         9888     9889    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rst0git
Copy link
Member

rst0git commented Jun 24, 2023

@osctobe Would it be possible to add a test for this functionality?

@osctobe
Copy link
Contributor Author

osctobe commented Jun 26, 2023

@osctobe Would it be possible to add a test for this functionality?

There are no tests for --leave-stopped or --leave-running yet that could be extended with this case. The change is tested in production (always enabled), though.

@avagin
Copy link
Member

avagin commented Jun 26, 2023

@osctobe Would it be possible to add a test for this functionality?

There are no tests for --leave-stopped or --leave-running yet that could be extended with this case.

Here is the test for --leave-stopped:
https://github.com/checkpoint-restore/criu/blob/criu-dev/test/jenkins/criu-stop.sh

The change is tested in production (always enabled), though.

I am sorry, but it doesn't work this way. I think our fault injection engine can be used to introduce a test. test/jenkins/criu-fault.sh contains all these tests.

if (ret || post_dump_ret || opts.final_state == TASK_ALIVE) {
if (opts.resume_on_dump_error && (ret || post_dump_ret))
opts.final_state = TASK_ALIVE;
if (opts.final_state == TASK_ALIVE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check that final_state isn't TASK_DEAD here. If finale_state is TASK_STOPPED, we have to do all these actions, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is kinda strange that we don't unlock network for --leave-stopped. One would not be able to just resume such stopped container with SIGCONT, but will also need to manually unlock network and remove link remaps and do other cleanups.

@osctobe osctobe force-pushed the leave-stopped branch 2 times, most recently from daaddc6 to c390cc2 Compare July 25, 2023 15:26
@0x7f454c46
Copy link
Member

You keep adding Change-Id: Ia8956063cdc130650cfcde86851ee6a14331f2c2 that pollute git logs and don't provide anything outside your company. Clean these up, please.

Comment on lines +216 to +218
*--no-resume-on-error*::
Leave tasks in stopped state even if checkpoint completed unsuccessfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some lines about what situations it is useful and why? What benefit does it provide?
How should a user decide what is preferred for them in case of criu failure?
Current doc lines are copy'n'paste from above without adding any useful context.

@osctobe osctobe force-pushed the leave-stopped branch 3 times, most recently from 105f912 to eef7254 Compare August 1, 2023 18:36
@osctobe osctobe force-pushed the leave-stopped branch 2 times, most recently from 2a95bc6 to 0e9ab38 Compare August 4, 2023 11:23
@Snorch
Copy link
Member

Snorch commented Aug 8, 2023

See the freezer_restore_state() related code. If before dump you put your processes in freezer cgroup and make it FROZEN, you can later decide after dump finishes if you want to make cgroup THAWED (no dump failure) or leave it frozen (on dump failure). This does effectively the same as you want to accomplish with this new option.

@avagin
Copy link
Member

avagin commented Aug 18, 2023

@osctobe could you response to comments?

@osctobe osctobe force-pushed the leave-stopped branch 2 times, most recently from 962791d to 3d261c0 Compare August 22, 2023 10:05
@osctobe osctobe marked this pull request as draft August 24, 2023 17:58
@osctobe osctobe force-pushed the leave-stopped branch 2 times, most recently from bf35bab to 40f34f5 Compare August 28, 2023 12:06
Copy link

A friendly reminder that this PR had no activity for 30 days.

Copy link

A friendly reminder that this PR had no activity for 30 days.

Make it possible to kill or leave stopped tasks if a dump failed
after stopping the tree.

Signed-off-by: Michał Mirosław <emmir@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants