Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sk-unix: add fake queuers only for sockets without peer #1197

Open
wants to merge 118 commits into
base: criu-dev
Choose a base branch
from

Conversation

mihalicyn
Copy link
Member

Fake queuers needed for sockets which are connected
but has closed peer (see zdtm/static/socket_close_data(01)).
We support this for both STREAM and DRGAM sockets, but
for DGRAM sockets our check in add_fake_unix_queuers
isn't correct - we should add fake queuer only and only if
DGRAM socket has no peer.

This change not fix real visible bug, but remove incorrect and
unnecessary actions during sockets restore which may follow bug
in the future.

This code covered by zdtm/static/socket_close_data(01).c testcases.

Signed-off-by: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com

rst0git and others added 30 commits July 16, 2020 05:32
When saddr.ss_family is AF_INET6 we should cast &saddr to
(struct sockaddr_in6 *).

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
When CRIU calls the ip tool on restore, it passes the fd of remote
socket by replacing the STDIN before execvp. The stdin is used by the
ip tool to receive input. However, the ip tool calls ftell(stdin)
which fails with "Illegal seek" since UNIX sockets do not support file
positioning operations. To resolve this issue, read the received
content from the UNIX socket and store it into temporary file, then
replace STDIN with the fd of this tmp file.

 # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns
 === Run 1/1 ================ zdtm/static/env00

 ========================= Run zdtm/static/env00 in ns ==========================
 Start test
 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
 Adding image cache
 Adding image proxy
 Run criu dump
 Run criu restore
 =[log]=> dump/zdtm/static/env00/31/1/restore.log
 ------------------------ grep Error ------------------------
 RTNETLINK answers: File exists
 (00.229895)      1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1
 (00.230316)      1: 	Running ip route restore
 Failed to restore: ftell: Illegal seek
 (00.232757)      1: Error (criu/util.c:712): exited, status=255
 (00.232777)      1: Error (criu/net.c:1479): IP tool failed on route restore
 (00.232803)      1: Error (criu/net.c:2153): Can't create net_ns
 (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed
 (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory
 (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED.
 ------------------------ ERROR OVER ------------------------
 ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
 ##################################### FAIL #####################################

Fixes checkpoint-restore#311

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of erroring, we should loop until we get the desired number of
bytes written, like regular I/O loops.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
This adds the ability to stream images with criu-image-streamer

The workflow is the following:
1) criu-image-streamer is started, and starts listening on a UNIX
   socket.
2) CRIU is started. img_streamer_init() is invoked, which connects to the
   socket. During dump/restore operations, instead of using local disk to
   open an image file, img_streamer_open() is called to provide a UNIX pipe
   that is sent over the UNIX socket.
3) Once the operation is done, img_streamer_finish() is called, and the
   UNIX socket is disconnected.

criu-image-streamer can be found at:
https://github.com/checkpoint-restore/criu-image-streamer

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
One can pass --stream to zdtm.py for testing criu with image streaming.
criu-image-streamer should be installed in ../criu-image-streamer
relative to the criu project directory. But any path will do providing
that criu-image-streamer can be found in the PATH env.

Added a few tests to run on travis-ci to make sure streaming works.
We run test that are likely to fail. However, it would be good to once
in a while run all tests with `--stream -a`.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
So, here's the enhanced version of the first try.

Changes are:

1. The wrapper name is criu-ns instead of crns.py
2. The CLI is absolutely the same as for criu, since the script
   re-execl-s criu binary. E.g.
	   scripts/criu-ns dump -t 1234 ...
   just works
3. Caller doesn't need to care about substituting CLI options,
   instead, the scripts analyzes the command line and
   a) replaces -t|--tree argument with virtual pid __if__ the
      target task lives in another pidns
   b) keeps the current cwd (and root) __if__ switches to another
      mntns. A limitation applies here -- cwd path should be the
      same in target ns, no "smart path mapping" is performed. So
      this script is for now only useful for mntns clones (which
      is our main goal at the moment).

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Looks-good-to: Andrey Vagin <avagin@openvz.org>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In Py2 `range` returns a list and `xrange` creates a sequence object
that evaluates lazily. In Py3 `range` is equivalent to `xrange` in Py2.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The print_data() function was part of the deprecated (and removed)
'show' action, and it was moved in util.c with the following commit:

	a501b48
	The 'show' action has been deprecated since 1.6, let's finally drop it.

	The print_data() routine is kept for yet another (to be deprecated too)
	feature called 'criu exec'.

The criu exec feature was removed with:

	909590a
	Remove criu exec code

	It's now obsoleted by compel library.
	Maybe-TODO: Add compel tool exec action?

Therefore, now we can drop print_data() as well.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
class ctypes.c_char_p
    Represents the C char * datatype when it points to a zero-
    terminated string. For a general character pointer that may
    also point to binary data, POINTER(c_char) must be used.
    The constructor accepts an integer address, or a bytes object.

https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
criu-3.12/criu/net.c:2043: overwrite_var: Overwriting "img" in "img =
open_image_at(-1, CR_FD_IP6TABLES, 0UL, pid)" leaks the storage that
"img" points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
…braries

     This patch only adds the support but does not enable it for building.

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Oddly, one of the test had a typo which should be fatal.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
We permanently have issues like this:
./test/jenkins/criu-iter.sh: 3: source: not found

It looks like a good idea to use one shell to run our jenkins scripts.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
On MIPS CPUs with VIPT caches also has aliasing issues, just like ARMv6.
To overcome this issue, page coloring 0x40000 align for shared mappings was introduced (SHMLBA) in kernel.
    https://github.com/torvalds/linux/blob/master/arch/mips/include/asm/shmparam.h

Related to this, zdtm test suites ipc.c shm.c shm-unaligned.c and shm-mp.c are passed.

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
k_rtsigset_t is 16Bytes in mips architecture but not 8Bytes.
so blk_sigset_extended be added in TaskCoreEntry and ThreadCoreEntry for dumping
extern 8Bytes data in parasite-syscall.c, restore extern 8Bytes data in cr-restore.c

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
Signed-off-by: Joshua Abraham <sinisterpatrician@gmail.com>
A similar one is already printed in check_options().

Before this patch:
> $ ./criu/criu -vvvvvv --deprecated --log-file=/dev/stdout xxx
> (00.000000) Turn deprecated stuff ON
> ...
> (00.029680) DEPRECATED ON
> (00.029687) Error (criu/crtools.c:284): unknown command: xxx

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
A few tests were still running on xenial because at some point they were
hanging. This switches now all tests to bionic except one docker test
which still uses xenial to test with overlayfs.

Signed-off-by: Adrian Reber <areber@redhat.com>
This moves the cross compilation tests to github actions, to slightly
reduce the number of Travis tests and run them in parallel on github
actions.

Signed-off-by: Adrian Reber <areber@redhat.com>
The message "Overwriting RPC settings with values from <filename>" is
misleading, giving the impression that file is being read and consumed.
It really puzzled me, since <filename> didn't exist.

What it needs to say is "Would overwrite", i.e. if a file with such name
is present, it would be used.

Also, add actual "Parsing file ..." so it will be clear which files are
being used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
While working on runc checkpointing, I incorrectly closed status_fd
prematurely, and received an error from CRIU, but it was
non-descriptive.

Do print the error from open().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The orphan pts master option was introduced with commit [1]
to enable checkpoint/restore of containers with a pty pair
used as a console.

[1] checkpoint-restore@6afe523

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
avagin and others added 20 commits August 11, 2020 11:23
(00.015271) unix: 	Add a peer: ino 2203289876 peer_ino 2203289875 family    1 type    1 state  1 name /mnt/test/zdtm/static/sockets03.test
(00.015277) Warn  (criu/sk-unix.c:475): unix: Shutdown mismatch -2091677421:1 -> -2091677420:0

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Master branch does not have mips support yet, so automated builds for
mips on the master branch fail.

Temporarily split mips cross-build into a separate files until mips
support will be mergded into the master branch.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
This one contains builtins module from which zdtm.py imports.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Adrian Reber <areber@redhat.com>
It seems travis supports now Ubuntu 20.04. Let's run at least one test
also on 20.04 (focal).

Signed-off-by: Adrian Reber <areber@redhat.com>
This commit adds protobuf definitions needed to checkpoint and
restore BPF map files along with the data they contain

Source files added:

* bpfmap-file.proto - Stores the meta-data about BPF maps

* bpfmap-data.proto - Stores the data (key-value pairs) contained
in BPF maps

Source files modified:

* fdinfo.proto - Added BPF map as a new kind of file descriptor.
'message file_entry' can now hold information about BPF map file
descriptors

* Makefile - Now generates build artifacts for bpfmap-file.proto
and bpfmap-data.proto

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit defines constants and includes necessary headers to c/r
BPF maps

Source files modified:

* magic.h - Defining BPFMAP_FILE_MAGIC and BPFMAP_DATA_MAGIC

* image-desc.h - Defining CR_FD_BPFMAP_FILE and CR_FD_BPFMAP_DATA

* image-desc.c - Create new entries for bpfmap-file and bpfmap-data
in CRIU's file descriptor set

* protobuf-desc.h - Defining PB_BPFMAP_FILE and PB_BPFMAP_DATA

* protobuf-desc.c - Including headers for BPF map protobuf images

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
Source files modified:

* Makefile.config - Checks whether libbpf is installed on the system.
If so, we add -lbpf to LIBS_FEATURES, -DCONFIG_HAS_LIBBPF to
FEATURE_DEFINES and set CONFIG_HAS_LIBBPF. This allows us to check for
the presence of libbpf before compiling or executing BPF c/r code and
ZDTM tests.

* Makefile - Set CONFIG_HAS_LIBBPF to clean all files.

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit enables CRIU to:
(a) identify an anonymous inode as being a BPF map
(b) parse information about BPF maps from procfs

Source files modified:

* files.c - Checks anonymous inodes to see whether they are BPF maps.
If so, sets struct fdtype_ops *ops to a structure that knows how to
dump BPF maps

* proc_parse.c - Function parse_fdinfo_pid_s() now checks whether the
current file being processed is a BPF map. If so, it calls a newly
defined function parse_bpfmap() which knows how to parse information
about BPF maps from procfs

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit enables CRIU to dump meta-data about BPF maps files by
prividing the structures and functions needed by other parts of the
code-base.

Source files added:

* bpfmap.c - defines new structures and functions:

    (a) struct fdtype_ops bpfmap_dump_ops:
            sets up the function handler to dump BPF maps

    (b) is_bpfmap_link():
            checks whether an anonymous inode is a BPF map file

    (c) dump_one_bpfmap():
            parses information for a BPF map file from procfs and
			dumps it

* include/bpfmap.h - structure and function declarations

Source files modified:

* Makefile.crtools - generates build artifacts for bpfmap.c

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit enables CRIU to dump data(key-value) pairs stored in BPF
maps

Source files modified:

* bpfmap.c

    - Function dump_one_bpfmap_data() reads the map's keys and
    values into two buffers using bpf_map_lookup_batch() and then
    writes them out to a protobuf image along with the number of
    key-value pairs read

    - Function dump_one_bpfmap() now dumps the data as well before
    returning

* include/bpfmap.h - Includes headers and declares functions needed to
dump BPF map data

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit enables CRIT to decode the contents of a protobuf image
that stores information related to BPF map

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit enables CRIU to restore a process' BPF map file
descriptors.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect a BPF map's information from its protobuf image
    (b) create and open a BPF map with the same parameters as when
    it was dumped
    (c) add the newly opened BPF map to the process' file descriptor
    list

* include/bpfmap.h - Structure declarations for restoring BPF maps

* files.c - Collects a BPF map's file entry during the restoration
phase

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit restores the data of BPF maps. A hash table (indexed by
the map's id) is used to store data objects for multiple BPF map
files that a process may have opened. Collisions are resolved with
chaining using a linked list.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect the protobuf image containing BPF map data
    (b) read the BPF map's data from the image and store it in the
    hash table
    (c) restore the map's data using bpf_map_update_batch()

* include/bpfmap.h
    - Defines the size of the hash table and maks to be used while
	indexing into it
	- Structure and function declarations that are used while restoring
	BPF map data

* cr-restore.c - Collects the protobuf image containing BPF map data
during the restoration phase

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This commit adds ZDTM tests for c/r of processes with BPF maps as open
files

Source files added:

* zdtm/static/bpf_hash.c - Tests for c/r of the data and meta-data of
BPF map type BPF_MAP_TYPE_HASH

* zdtm/static/bpf_array.c - Tests for c/r of the data and meta-data
of BPF map type BPF_MAP_TYPE_ARRAY

Source files modified:

* zdtm/static/Makefile - Generating build artifacts for BPF tests

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
Source files modified:

* travis/vagrant.sh - Adding libbpf-devel

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
This change fixes the error:
error: 'security_context_t' is deprecated
[-Werror=deprecated-declarations]

Source files modified:

* lsm.c
* net.c

Please refer to:
SELinuxProject/selinux@9eb9c9327

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
Since commit cdd08cd ("uffd: use userns_call() to execute
ioctl(UFFDIO_API)") UFFD_API ioctl() is wrapped with userns_call() and this
allows runing lazy-pages tests on recent kernels in uns.

Restore testing of lazy-pages in uns in travis.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
The docker hub container registry is not updated as fast as Fedora's
registry at registry.fedoraproject.org. Fedora's registry gets a new
image whenever there is a new version of rawhide, docker hub's rawhide
image can take a couple of weeks because the process is not automated.

Especially when Fedora branches of a new release we see lot's of errors
in CRIU's Fedora rawhide based Travis runs. Switch to Fedora's registry
to always have the newest rawhide images for our tests.

Signed-off-by: Adrian Reber <areber@redhat.com>
Fake queuers needed for sockets which are connected
but has closed peer (see zdtm/static/socket_close_data(01)).
We support this for both STREAM and DRGAM sockets, but
for DGRAM sockets our check in add_fake_unix_queuers
isn't correct - we should add fake queuer only and only if
DGRAM socket has no peer.

This change not fix real visible bug, but remove incorrect and
unnecessary actions during sockets restore which may follow bug
in the future.

This code covered by zdtm/static/socket_close_data(01).c testcases.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
if (!(ui->ue->state == TCP_ESTABLISHED && !ui->peer) &&
ui->ue->type != SOCK_DGRAM)
if (!(ui->ue->type == SOCK_STREAM && ui->ue->state == TCP_ESTABLISHED && !ui->peer) &&
!(ui->ue->type == SOCK_DGRAM && !ui->peer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is SOCK_SEQPACKET?
do we need to add a fake queuer if there is nothing in a recv queue?

Copy link
Member

@Snorch Snorch Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is SOCK_SEQPACKET?

note: we don't support established non-stream: https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/sk-unix.c#L1828

@@ -2225,8 +2225,8 @@ int add_fake_unix_queuers(void)
list_for_each_entry(ui, &unix_sockets, list) {
if ((ui->ue->uflags & (USK_EXTERN | USK_CALLBACK)) || ui->queuer)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we check that queuer isn't nil, why do we need to check it again below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: queuer != peer

@Snorch
Copy link
Member

Snorch commented Sep 15, 2020

We probably should add:

Fixes: 254958ce7 ("unix: Add fake queuer for standalone dgram sockets")

@github-actions
Copy link

github-actions bot commented Jan 6, 2021

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet