Don't dump pages which only contain zero bytes #2331

simonis · 2024-01-15T14:55:25Z

I want to propose a new command line option zero-pages which, if enabled, detects pages which only contain zero bytes and skip their dumping to the image file. At restore, such pages will be replaced by the kernels zero page automatically.This can be useful for runtimes like Java, which often allocate large memory regions without fully using them (e.g. for the heap). For a simple Helloworld Java program, this new feature shrinks the image size be about 20% from 13mb to 11mb.

I've enabled GitHub Actions on my fork and all the test are green except the Alpine test which I see failing on upstream as well and the Code Linter test where I can't understand what he's objecting.

I'm a first time contributor to this project so please let me know if I've missed something.

rst0git · 2024-01-15T15:56:27Z

This can be useful for runtimes like Java, which often allocate large memory regions without fully using them (e.g. for the heap). For a simple Helloworld Java program, this new feature shrinks the image size be about 20% from 13mb to 11mb.

@simonis It might be more effective to address this problem using compression for memory pages, similar to the approach used in CRaC.

simonis · 2024-01-16T10:12:01Z

This can be useful for runtimes like Java, which often allocate large memory regions without fully using them (e.g. for the heap). For a simple Helloworld Java program, this new feature shrinks the image size be about 20% from 13mb to 11mb.

@simonis It might be more effective to address this problem using compression for memory pages, similar to the approach used in CRaC.

I think compressing the image as done criu-crac is somewhat orthogonal. It decreases the image size which is nice if you have to transfer it to another host. But before restore, it decompresses the image again and loads all its content (even the pages containing just zeros) into memory. The zeros are mmaped into the target process which takes time and also increases the initial RSS of the target process.

With the solution proposed here, fewer memory has to be mmaped to the target process. The zero pages will be COWed gradually as they are needed from fresh pages and don't have to be read from the image on the disc which is faster.

adrianreber · 2024-01-16T16:06:31Z

I think the code linter wants a \n at the end of the warning.

Without looking too closely, the idea of this PR sounds like a good one.

simonis · 2024-01-16T16:25:32Z

I think the code linter wants a \n at the end of the warning.

Thanks, I've fixed that now.

Without looking too closely, the idea of this PR sounds like a good one.

Thanks.

criu/crtools.c

criu/mem.c

criu/include/cr_options.h

criu/config.c

criu/mem.c

rst0git · 2024-01-19T14:36:39Z

Code Linter test where I can't understand what he's objecting

@simonis Would it be possible to run make indent before committing your changes? This should fix the code style problems such as missing spaces. In addition, it would be great if you can add a commit message with a description that explains what changes are introduced and why. The following blog post describes how to write good commit messages: https://cbea.ms/git-commit/

test/javaTests/test-zero.xml

avagin · 2024-01-21T04:04:48Z

criu/mem.c

+		if (should_dump_page(vma->e, at[pfn])) {
+			if (opts.skip_zero_pages) {
+				remote[0].iov_base = (void*)vaddr;
+				nread = process_vm_readv(item->pid->real, local, 1, remote, 1, 0);


I don't like the idea to read process memory twice. We have to avoid this.

btw: #2292 solves the same problem in a more optimal way.

I agree that it would be better if we can avoid reading process memory twice.

#2292 solves the same problem in a more optimal way.

I think this is different. In #2292, we exclude pages with zero PFN (PAGE_IS_PFNZERO), while this option skips zero-filled memory (e.g., memory that has been filled with zeros using memset() would be skipped with this option).

Is this just a performance or a correctness issue? If it's just about performance, I think the benefit might justify the additional overhead and after all the feature is on by default.

In the case this is a correctness problem, do you have a suggestion how we can avoid this?

PS: and yes, @rst0git is right - this change is about skipping regular pages which are filled with only zero bytes.

This can be useful for runtimes like Java, which often allocate large memory regions without fully using them (e.g. for the heap). For a simple Helloworld Java program, this new feature shrinks the image size be about 20% from 13mb to 11mb.

we exclude pages with zero PFN (PAGE_IS_PFNZERO), while this option skips zero-filled memory

I believe we need a zdtm test, which can reproduce such a zero page without PAGE_IS_PFNZERO but with zero data. Maybe, we even want to fix kernel to report such a page as "PAGE_IS_PFNZERO" instead.

I agree with Andrei that manually checking page content with memcmp is an anti-pattern. Nack from me, at list in current state.

I also agree. Reading a page second time and using memcmp() sounds not optimal.

I still like the idea of not including empty pages in the checkpoint, but it sounds difficult.

If the kernel could track it, that would be nice. Not sure the kernel has a better alternative than memcmp() to find a zeroed memory page.

At this point I think it would be nice to see some numbers. How much faster is restoring if something like this PR is applied. Although I don't like the memcmp() it would only be used if the corresponding command-line option is explicitly selected by the user. Maybe that makes it acceptable. Can the second reading of the page be avoided?

Maybe some post-processing of the checkpoint image would be an alternative. Remove the empty pages after checkpointing and have support during restore to handle pages like this.

Is this just a performance or a correctness issue? If it's just about performance, I think the benefit might justify the additional overhead and after all the feature is on by default.

I don't understand why we need to read process pages to do this check? Why can't we do that before dumping these pages into the image (page_xfer_dump_pages)?

I believe memset(0) is used in specific scenarios where applications/libraries are dealing with sensitive data and/or use custom memory management.

It's not just the "malloc+memset(0)" use case. Java does mmap and pretouch memory so we can have a lot of pages with only zero content (but not the kernel zero page) in some scenarios.

First, I thought it does not work, upd: that was stupid of me not to enable --skip-zero-pages, with option it works fine, sorry.

Second, If Java put so much effort to have those zeroed pages in RSS isn't it a bad idea to restore those pages like they are "PAGE_IS_PFNZERO" ones? =)

[root@turmoil tmp]# ./malloc-test Enter any char to stop ------ In another terminal ------ [root@turmoil snorch]# grep 2097164 -A3 -B1 /proc/$(pidof malloc-test)/smaps 7fb87b744000-7fb8fb747000 rw-p 00000000 00:00 0 Size: 2097164 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 2097160 kB [root@turmoil tmp]# /home/snorch/devel/ms/criu/criu/criu dump --skip-zero-pages -v4 -o dump.log -t $(pidof malloc-test) -j -D /images-dir/ [root@turmoil tmp]# /home/snorch/devel/ms/criu/criu/criu restore -j -D /images-dir/ ------ In another terminal ------ [root@turmoil snorch]# grep 2097164 -A3 -B1 /proc/$(pidof malloc-test)/smaps 7fb87b744000-7fb8fb747000 rw-p 00000000 00:00 0 Size: 2097164 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 1048584 kB

upd2: If we want to preserve them in RSS, we can remember those special zero-filled pages in images on dump without saving their data and then on restore put them to RSS by writing zeroes.

Second, If Java put so much effort to have those zeroed pages in RSS isn't it a bad idea to restore those pages like they are "PAGE_IS_PFNZERO" ones? =)

@Snorch, Java (or OpenJDK based JVMs to be more specific) was not designed and optimized for cloud/container use cases but rather for large, monolithic application servers. In the old days, pretouching/zeroing memory was a way to pre-allocate physical memory and not potentially get it from swap later. For current cloud/container use cases the huge memory footprint can be problem. With check-pointing, a small image is more important and COWing a PAGE_IS_PFNZERO page is much faster then loading and populating it from disk.

I don't understand why we need to read process pages to do this check? Why can't we do that before dumping these pages into the image (page_xfer_dump_pages)?

Thanks a lot for your suggestion @avagin. I'm short of time for the next week because of FOSDEM/Jfokus but I'll try to come up with a new version which moves the zero check to page_xfer_dump_pages() afterwards.

@simonis FOSDEM was my favorite conference when I lived close by. btw @mihalicyn are there too, he is one of criu maintainers. He will be happy to help with any questions.

simonis · 2024-01-22T12:57:19Z

Code Linter test where I can't understand what he's objecting

@simonis Would it be possible to run make indent before committing your changes? This should fix the code style problems such as missing spaces. In addition, it would be great if you can add a commit message with a description that explains what changes are introduced and why. The following blog post describes how to write good commit messages: https://cbea.ms/git-commit/

Done (had to install a newer version of clang-format).

simonis · 2024-01-24T16:48:33Z

I believe we need a zdtm test, which can reproduce such a zero page without PAGE_IS_PFNZERO but with zero data.

I've now added a new zdtm test which verifies that the --skip-zero-pages option does indeed optimize pages which only contain zero bytes.

Introduces a new command line option '--skip-zero-bytes' which detects pages which only contain zero bytes and prohibits that they get dumped in the processes image file. It is a potentially expensive operation because it checks for every single process page if it contains only zeros, but it can significantly decrease the image size and improve the startup-time if many such pages exist. It effectively replaces such pages which the kernel's zero-page on restore. Signed-off-by: Volker Simonis <volker.simonis@gmail.com>

rst0git · 2024-01-31T13:02:21Z

test/zdtm.py

@@ -2697,6 +2701,9 @@ def get_cli_args():
    rp.add_argument("--noauto-dedup",
                    help="Manual deduplicate images on iterations",
                    action='store_true')
+    rp.add_argument("--skip-zero-pages",


@simonis Would you be able to also enable testing with existing ZDTM tests using --skip-zero-pages in run-ci-tests.sh?

@simonis It was great meeting you at FOSDEM and your talk was very good!

We can use something like the following patch to run the existing ZDTM tests with --skip-zero-pages: rst0git@45a8ca1

@avagin Thank you for the review! Would it be sufficient to run the following tests?

diff --git a/scripts/ci/run-ci-tests.sh b/scripts/ci/run-ci-tests.sh index ef7e869e0..8f5e25d03 100755 --- a/scripts/ci/run-ci-tests.sh +++ b/scripts/ci/run-ci-tests.sh @@ -268,6 +268,9 @@ make -C test/others/rpc/ run ./test/zdtm.py run -t zdtm/transition/maps007 --pre 2 --page-server --dedup ./test/zdtm.py run -t zdtm/transition/maps007 --pre 2 --pre-dump-mode read +# Run tests with --skip-zero-pages +./test/zdtm.py run --skip-zero-pages -T '.*maps0.*' + ./test/zdtm.py run -t zdtm/transition/pid_reuse --pre 2 # start time based pid reuse detection ./test/zdtm.py run -t zdtm/transition/pidfd_store_sk --rpc --pre 2 # pidfd based pid reuse detection

rppt · 2024-02-03T09:48:21Z

\On Thu, Feb 1, 2024 at 9:47 PM Andrei Vagin ***@***.***> wrote:

@avagin commented on this pull request. ________________________________ In criu/mem.c: > vaddr = vma->e->start + *off + pfn * PAGE_SIZE; + /* + * If should_dump_page() returns true, it means the page is in the dumpees resident memory + * (i.e. bit 63 of the page frame number 'at[pfn]' is set) but it is not the zero-page. + */ + if (should_dump_page(vma->e, at[pfn])) { + if (opts.skip_zero_pages) { + remote[0].iov_base = (void*)vaddr; + nread = process_vm_readv(item->pid->real, local, 1, remote, 1, 0); @simonis FOSDEM was my favorite conference when I lived close by. btw @mihalicyn are there too, he is one of criu maintainers. He will be happy to help with any questions.

Both @mihalicyn and @rst0git are in the Containers devroom

…

-- Sincerely yours, Mike.

github-actions · 2024-03-08T00:38:14Z

A friendly reminder that this PR had no activity for 30 days.

simonis · 2024-03-08T09:14:14Z

Still on my ToDo list, so post this to avoid auto-closing..

simonis force-pushed the skip-zero-pages branch from 9bdc424 to 9fe9d22 Compare January 16, 2024 16:24

rst0git reviewed Jan 16, 2024

View reviewed changes

criu/crtools.c Outdated Show resolved Hide resolved

criu/crtools.c Outdated Show resolved Hide resolved

criu/mem.c Outdated Show resolved Hide resolved

criu/include/cr_options.h Outdated Show resolved Hide resolved

criu/config.c Outdated Show resolved Hide resolved

simonis force-pushed the skip-zero-pages branch from 9fe9d22 to 4c04936 Compare January 18, 2024 15:37

rst0git reviewed Jan 19, 2024

View reviewed changes

criu/mem.c Outdated Show resolved Hide resolved

rst0git reviewed Jan 19, 2024

View reviewed changes

test/javaTests/test-zero.xml Outdated Show resolved Hide resolved

rst0git reviewed Jan 19, 2024

View reviewed changes

test/javaTests/test-zero.xml Outdated Show resolved Hide resolved

avagin reviewed Jan 21, 2024

View reviewed changes

simonis force-pushed the skip-zero-pages branch 2 times, most recently from 4bddca9 to 9d70e5f Compare January 24, 2024 15:59

simonis force-pushed the skip-zero-pages branch from 9d70e5f to b23ad22 Compare January 24, 2024 18:05

rst0git reviewed Jan 31, 2024

View reviewed changes

github-actions bot added the stale-pr label Mar 8, 2024

avagin added no-auto-close Don't auto-close as a stale issue and removed stale-pr labels Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't dump pages which only contain zero bytes #2331

Don't dump pages which only contain zero bytes #2331

simonis commented Jan 15, 2024

rst0git commented Jan 15, 2024

simonis commented Jan 16, 2024 •

edited

adrianreber commented Jan 16, 2024

simonis commented Jan 16, 2024

rst0git commented Jan 19, 2024

avagin Jan 21, 2024

rst0git Jan 21, 2024

simonis Jan 22, 2024

Snorch Jan 23, 2024

adrianreber Jan 23, 2024

avagin Jan 25, 2024 •

edited

Snorch Jan 26, 2024 •

edited

simonis Jan 31, 2024

simonis Jan 31, 2024

avagin Feb 1, 2024

simonis commented Jan 22, 2024

simonis commented Jan 24, 2024

rst0git Jan 31, 2024 •

edited

rst0git Feb 6, 2024

rst0git Feb 6, 2024

rppt commented Feb 3, 2024 via email

github-actions bot commented Mar 8, 2024

simonis commented Mar 8, 2024

Don't dump pages which only contain zero bytes #2331

Are you sure you want to change the base?

Don't dump pages which only contain zero bytes #2331

Conversation

simonis commented Jan 15, 2024

rst0git commented Jan 15, 2024

simonis commented Jan 16, 2024 • edited

adrianreber commented Jan 16, 2024

simonis commented Jan 16, 2024

rst0git commented Jan 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avagin Jan 25, 2024 • edited

Choose a reason for hiding this comment

Snorch Jan 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonis commented Jan 22, 2024

simonis commented Jan 24, 2024

rst0git Jan 31, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rppt commented Feb 3, 2024 via email

github-actions bot commented Mar 8, 2024

simonis commented Mar 8, 2024

simonis commented Jan 16, 2024 •

edited

avagin Jan 25, 2024 •

edited

Snorch Jan 26, 2024 •

edited

rst0git Jan 31, 2024 •

edited