Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump and restore cpu affinity of each thread. #2014

Open
wants to merge 118 commits into
base: criu-dev
Choose a base branch
from

Conversation

hdzhoujie
Copy link
Contributor

@hdzhoujie hdzhoujie commented Dec 8, 2022

Add one entry of thread_cpuallow_entry into thread_core_entry to save cpu affinity info. Restore it after threads restored but before running. Add option --with-cpu-affinity to enable this function at restore.
Signed-off-by: hdzhoujie zhoujie133@huawei.com
Signed-off-by: He jingxian hejingxian@huawei.com
Signed-off-by: Sang Yan sangyan@huawei.com

prakritigoyal19 and others added 30 commits May 12, 2022 17:55
Change made through this commit:
- Include copy of flog as a seperate tree.
- Modify the makefile to add and compile flog code.

Signed-off-by: prakritigoyal19 <prakritigoyal19@gmail.com>
CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS)
 va_end was not called for argptr.

Signed-off-by: Adrian Reber <areber@redhat.com>
Separate commit for easier criu-dev <-> master transfer.

Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
It is mapped, not maped. Same applies for mmap I guess.

Found by codespell, except it wants to change it to mapped,
which will make it less specific.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Brought to you by

    codespell -w

(using codespell v2.1.0).

[v2: use "make indent" on the result]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It can be confusing to see error from post-dump action script and non
zero return from criu though at the same time see "Dumping finished
successfully" in log. I believe it is logical to consider post-dump
action script as a part of "dump" process so fail in it means that the
whole dump failed.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
As private hugetlb mappings are not pre-mapped, the content of them is restored
in the the restorer which cannot use page_read->read_pages. As a result, we
cannot recursively read the content of pre-dumped image in the parent directory
and use preadv to read the content from the last dumped image only. Therefore,
it may freeze while restoring when the content of mapping is in pre-dumped image
in parent directory.

We need to skip pre-dumping on hugetlb mappings to resolve the issue.

Suggested-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
This reverts commit 37ea8c5.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Name collision with an abandoned project named 'crit' in pypi causes pip
to show crit (CRiu Image Tool) as outdated.  This patch updates crit to
use the same version and license as criu.

Fixes checkpoint-restore#1878

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
But actually, 5a92f10 probably has to be reverted as a whole.
PIPE_MAX_SIZE is the hard limit to avoid PAGE_ALLOC_COSTLY_ORDER
allocations in the kernel. But F_SETPIPE_SZ rounds up a requested pipe
size to a power-of-2 pages. It means that when we request PIPE_MAX_SIZE
that isn't a power-of-2 number, we actually request a pipe size greater
than PIPE_MAX_SIZE.

Fixes: 5a92f10 ("page-pipe: Resize up to PIPE_MAX_SIZE")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Due to side effects of F_SETPIPE_SZ, the actual pipe size can be greater
than PIPE_MAX_SIZE.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
In this case, vmplice attaches pages without coping them.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
* handle unexpected errors of process_vm_readv
* adjust riovs in analyze_iov
* call handle_faulty_iov only if process_vm_readv returns EFAULT.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
When building packages for CRIU the source directory might have a
name different than 'criu'.

Fixes: checkpoint-restore#1877

Reported-by: @siris
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Building the criu packages for Ubuntu/Debian fails with:

	mkdir: cannot create directory '/var/lib/criu': Permission denied

This patch updates PLUGINDIR with the value /usr/lib/criu

Fixes: checkpoint-restore#1877

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This allows us to only detect bad formating in PR changes but not all
the CRIU codebase.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
criu-ns script incorrectly compares the pidns fd with mntns fd.
Also reversed the condition in is_my_namespace function to align it
with the function name.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Before this patch, if we had a unixsk with incomming scm packets (with
fds) and with the sender side fd closed, we got an error:

Error (criu/sk-unix.c:1125): unix: Can't find sender for 0x1e

First part of the problem is that unix_note_scm_rights() expects to see
a "queuer" which would send scm packets to the unixsk, and there is no
as the sender side is closed.

Second part of the problem is that we already have "fake" queuers
feature so that it already creates a unix socket pair and leaves other
end open for later queuing packets. But function add_fake_unix_queuers()
is called after unix_note_scm_rights() thus there is no chance to find
queuer at the point of failure.

Third part is that when we look for a queuer in find_queuer_for() we
actually look for a socket for which we are a queuer and not for the
socket which is a queuer for us, which is opposite to the name. For
cases where both ends are alive both are queuers for each other so this
was not important, but for our closed sender case it breaks.

So let's reorder add_fake_unix_queuers() before unix_note_scm_rights()
and make find_queuer_for() actually do what it's name implies.

This situation is started to reproduce on Virtuozzo start/stop tests
with the unixsk belonging to systemd, we suppose that this state where
the sender fd side is closed happens rarely only on systemd start/stop,
so we don't see it in regular suspend resume of long-living containers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This helper restores master_id and shared_id of first mount in the
sharing group. It first copies sharing from either external source or
internal parent sharing group and makes master_id from shared_id. Next
it creates new shared_id when needed.

All other mounts except first are just copied from the first one.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
…root

It's a problem when while restoring sharing group we need to copy
sharing between two mounts with non-intersecting roots, because kernel
does not allow it.

We have a case opencontainers/runc#3442, where
runc adds different devtmpfs file-bindmounts to container and there is
no fsroot mount in container for this devtmpfs, thus mount-v2 faces the
above problem.

Luckily for the case of external mounts which are in one sharing group
and which have non-intersecting roots, these mounts likely only have
external master with no sharing, so we can just copy sharing from
external source and make it slave as a workaround.

checkpoint-restore#1886

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
…roach

Currently, the content of anonymous private hugetlb mapping is dumped in 2
different images: memfd approach and normal private mapping dumping. In memfd
approach, we dump the content of the backing pseudo file (/anon_hugepage). This
is incorrect and redundant since the mapping is private, the content of backing
file may differ from the content of the mapping. With this commit, we remove the
redundant memfd approach dump and only do the normal private mapping dump on
anonymous hugetlb mapping.

Run zdtm.py run -f h --keep-img always -t zdtm/static/maps09, du -h in the
dumped image directory

Before this commit
	13M     test/dump/zdtm/static/maps09/55/1
After this commit
	8.5M    test/dump/zdtm/static/maps09/55/1

The reduction in size is approximately 4MB which is the size of anonymous
private hugetlb mapping in the test.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
…nter

Else we have a Segmentation fault in __move_mount_set_group() on
xfree(source_mp) if resolve_mountpoint() returned statically allocated
path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This test has one external mount [criumntns] /zdtm_root_ext.tmp ->
[testmntns] /mnt_root_ext.test, and it specifically gives '--external
mnt[MNT]:.zdtm_root_ext.tmp' option on restore without '/' to make
dirname on it return static '.' path (see glibc dirname() code) and
reproduce a segfault in resolve_mountpoint().

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
minhbq-99 and others added 16 commits November 1, 2022 21:35
…lers

As threads in a process may be in different threaded controllers, we need to
move thoses threads to the correct controllers.

Because the threads of a process are restored in later stage in restorer.c, we
need to create a cgroupd service to help to move those threads into correct
controllers when they are restored. We cannot use usernsd as the code in
restorer does not know the address of outside function to pass to userns_call.
However, this cgroupd service still reuses a lot of code from usernsd.

The main logic is that restored threads receive the cg_set number they belong to
before restorer stage in case their cg_set are different from main thread. When
these threads are restored, they send the cg_set number and their thread ids
through unix socket to cgroupd. cgroupd receives the cg_set number and thread
ids and moves those threads into correct controllers. Thread ids are sent
through SCM_CREDENTIALS of unix socket so they are translated into correct
thread ids in the receiving end.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
This test creates a process with 2 threads in different threaded controllers and
check if CRIU restores these threads' cgroup controllers properly.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
As cgroupv2_00, cgroupv2_01 need cpuset in cgroup-v2 hierarchy to check CRIU
handle cgroup-v2 properly, umount cpuset in cgroup-v1 to make it move to
cgroup-v2.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
…lled

Signed-off-by: Adrian Reber <areber@redhat.com>
A previous commit added a cgroup cpuset unmounting to
scripts/ci/Makefile. We are sometimes running in a container without the
necessary privileges to unmount certain cgroups.

This commit moves the cgroup unmounting to a place in run-ci-tests.sh
which already requires privileged access and does not break unprivileged
build-only CI runs.

Signed-off-by: Adrian Reber <areber@redhat.com>
…turned

Some users on Raspberry Pi report that the kerndat checking for
memfd_create(MFD_HUGETLB) support returns ENOSYS even when memfd_create
syscall is available. We currently treat this error as unexpected and
return error. This commit marks the memfd_create(MFD_HUGETLB) as
unavailable when ENOSYS is returned.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Zombie tasks are dumped in dump_zombies() so it is redundant to handle them
in dump_one_task().

Deprecate cg_set in task_core_entry as this field must be per thread now.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Mathias Gibbens <mathias@calenhad.com>
This patch adds a missing definition for `__nmk_dir` in the Makefile
for the amdgpu plugin. This definition is required, for example, when
building the `test_topology_remap` target:

	make -C plugins/amdgpu/ test_topology_remap

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
While building on a machine that has a HOL clang compiler,
I ran into warnings regarding the changed line.  It appears
this warning is on by default because of anticipated changes
to the C standard.

Signed-off-by: Drew Wock <ajwock@gmail.com>
The way ShellCheck is installed was changed in commit c056f99
(ci/gha/lint: install a recent shellcheck) to use the latest version
v0.8.0 and remove some of the "shellcheck disable=..." annotations.
Since then, Fedora 37 has been released and the ShellCheck package
has been updated to v0.8.0.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The python3 package in Alpine has recently been updated to install
symbolic link for /usr/bin/python.

https://git.alpinelinux.org/aports/commit/main/python3?id=d91da210b1614eb75517d59b7f348fee01699f35

This causes the following error in CI:

  Step 10/11 : RUN ln -s /usr/bin/python3 /usr/bin/python
   ---> Running in a5a94be9dc93
  ln: failed to create symbolic link '/usr/bin/python': File exists
  The command '/bin/sh -c ln -s /usr/bin/python3 /usr/bin/python' returned a non-zero code: 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch fixes applies the changes required by clang-format v15.0.5
for `make indent`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
In order to reduce the frequency of using system call, based on
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/create_inode.c#n519,
I created a new algorithm of dumping chunk via fiemap.(copy_file_to_chunks_fiemap)

Also, I added another BOOL_OPT for users to determine which algorithm they
want to use. Moreover, for those filesystem not supporting fiemap, criu
will fall back to the original algorithm(SEEK_HOLE/SEEK_DATA).

v2: don't call copy_chunk_from_file on outstanding extent; rearange
headers to workaround "redeclaration of ‘enum fsconfig_command’" problem

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
ghost_multi_hole00 and ghost_multi_hole01 are tests which create a ghost file
with a lot of holes, there are 4K data and 4K hole inside every 8K length.

The only difference between them is ghost-fiemap option, 01 is a
test for the fiemap dumping algorithm, and we want to test the
behavior of EXTENT_MAX_COUNT part, so the file size should be 8M, thus there
will be 1024 chunks in the ghost file.

In some file system, such as xfs, we somehow can not easily create highly sparse
file as in ext4 or btrfs, therefore we need `fallocate` to forcibly create holes.

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
Signed-off-by: Shubham Verma <shubhamv.sv@gmail.com>
criu/pstree.c Outdated Show resolved Hide resolved
@hdzhoujie hdzhoujie force-pushed the criu-dev branch 3 times, most recently from 0c50d66 to 9a98e31 Compare December 15, 2022 03:36
Add one entry of thread_cpuallow_entry into thread_core_entry to save cpu affinity info.
Restore it after threads restored but before running.
Add option --with-cpu-affinity to enable this function at restore.
Signed-off-by: hdzhoujie <zhoujie133@huawei.com>
Signed-off-by: He jingxian <hejingxian@huawei.com>
Signed-off-by: Sang Yan <sangyan@huawei.com>
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rst0git
Copy link
Member

rst0git commented Jan 20, 2023

@hdzhoujie Would it be possible to add tests to verify that the CPU affinity is restored correctly?

@github-actions github-actions bot removed the stale-pr label Jan 21, 2023
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@avagin avagin force-pushed the criu-dev branch 2 times, most recently from f392ea1 to 4d137b8 Compare June 12, 2023 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet