Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port to riscv64 #2234

Open
wants to merge 7 commits into
base: criu-dev
Choose a base branch
from

Conversation

ancientmodern
Copy link
Contributor

@ancientmodern ancientmodern commented Aug 1, 2023

Add riscv64 support to CRIU. Hopefully solve issue #1702

Present status of zdtm tests:

################### 6 TEST(S) FAILED (TOTAL 454/SKIPPED 46) ####################
 * zdtm/static/apparmor_stacking(h)
 * zdtm/static/netns_lock_iptables(h)
 * zdtm/static/socket-tcp-closed-last-ack(uns)
 * zdtm/static/socket-tcp-nfconntrack(h)
 * zdtm/static/socket-tcp-syn-sent(uns)
 * zdtm/transition/maps007(unknown)
##################################### FAIL #####################################

@mihalicyn @felicitia for you reference

WIP: I'll enrich this pull request with more details by tomorrow. And some sections might have been affected when I removed debug info from our code :)

@mihalicyn mihalicyn self-requested a review August 1, 2023 07:54
@mihalicyn
Copy link
Member

mihalicyn commented Aug 1, 2023

If you did this work together with Yixue, you can add Co-authored-by to the commit messages.
https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

Great work! I'll take a look and review this.

@ancientmodern
Copy link
Contributor Author

ancientmodern commented Aug 1, 2023

If you did this work together with Yixue, you can add Co-authored-by to the commit messages. https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

Great work! I'll take a look and review this.

Thank you Alex! I will make the updates with Yixue tomorrow. If you'd like to test it locally, this branch should be functional with the kernel patch below, though with tons of messy commits and debugging stuff 😂

Patch:

---
 arch/riscv/kernel/signal.c | 71 +++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 9aff9d720590..01ccb949d26e 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -16,6 +16,7 @@
 
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
+#include <asm/syscall.h>
 #include <asm/signal.h>
 #include <asm/signal32.h>
 #include <asm/switch_to.h>
@@ -284,35 +285,79 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 
 void arch_do_signal_or_restart(struct pt_regs *regs)
 {
+	unsigned long continue_addr = 0, restart_addr = 0;
+	int retval = 0;
 	struct ksignal ksig;
-
-	if (get_signal(&ksig)) {
-		/* Actually deliver the signal */
-		handle_signal(&ksig, regs);
-		return;
-	}
+	bool syscall = (regs->cause == EXC_SYSCALL);
 
 	/* Did we come from a system call? */
-	if (regs->cause == EXC_SYSCALL) {
+	if (syscall) {
+		continue_addr = regs->epc;
+		restart_addr = continue_addr - 4;
+		retval = regs->a0;
+
 		/* Avoid additional syscall restarting via ret_from_exception */
 		regs->cause = -1UL;
 
-		/* Restart the system call - no handlers present */
-		switch (regs->a0) {
+		/*
+		 * Prepare for system call restart. We do this here so that a
+		 * debugger will see the already changed PC.
+		 */
+		switch (retval) {
 		case -ERESTARTNOHAND:
 		case -ERESTARTSYS:
 		case -ERESTARTNOINTR:
-                        regs->a0 = regs->orig_a0;
-			regs->epc -= 0x4;
+			regs->a0 = regs->orig_a0;
+			regs->epc = restart_addr;
 			break;
 		case -ERESTART_RESTARTBLOCK:
-                        regs->a0 = regs->orig_a0;
+			regs->a0 = regs->orig_a0;
 			regs->a7 = __NR_restart_syscall;
-			regs->epc -= 0x4;
+			regs->epc = restart_addr;
 			break;
 		}
 	}
 
+	// printk("~RISCV~ Before get_signal: a7 = %ld, a0 = %ld, orig_a0 = %ld, cause = %ld\n",
+	//        regs->a7, regs->a0, regs->orig_a0, regs->cause);
+
+	/*
+	 * Get the signal to deliver. When running under ptrace, at this point
+	 * the debugger may change all of our registers.
+	 */
+	if (get_signal(&ksig)) {
+		/*
+		 * Depending on the signal settings, we may need to revert the
+		 * decision to restart the system call, but skip this if a
+		 * debugger has chosen to restart at a different PC.
+		 */
+		if (regs->epc == restart_addr &&
+		    (retval == -ERESTARTNOHAND ||
+		     retval == -ERESTART_RESTARTBLOCK ||
+		     (retval == -ERESTARTSYS &&
+		      !(ksig.ka.sa.sa_flags & SA_RESTART)))) {
+			syscall_set_return_value(current, regs, -EINTR, 0);
+			regs->epc = continue_addr;
+		}
+
+		/* Actually deliver the signal */
+		handle_signal(&ksig, regs);
+		return;
+	}
+
+	// printk("~RISCV~ After get_signal: a7 = %ld, a0 = %ld, orig_a0 = %ld, cause = %ld\n",
+	//        regs->a7, regs->a0, regs->orig_a0, regs->cause);
+
+	/*
+	 * Handle restarting a different system call. As above, if a debugger
+	 * has chosen to restart at a different PC, ignore the restart.
+	 */
+	if (syscall && regs->epc == restart_addr) {
+		if (retval == -ERESTART_RESTARTBLOCK)
+			regs->a7 = __NR_restart_syscall;
+		// user_rewind_single_step(current);
+	}
+
 	/*
 	 * If there is no signal to deliver, we just put the saved
 	 * sigmask back.

Copied from my gitter msg:

Sorry for this very late reply as I forgot to turn on gitter notification :( But I wanted to share an exciting update about the development of the RISC-V version of CRIU. It now only fails 7 out of 454 zdtm tests! Here are the final results of ./zdtm.py run -a --keep-going:

################### 7 TEST(S) FAILED (TOTAL 454/SKIPPED 46) ####################
 * zdtm/static/apparmor_stacking(h)
 * zdtm/static/cmdlinenv00(ns)
 * zdtm/static/netns_lock_iptables(h)
 * zdtm/static/socket-tcp-closed-last-ack(uns)
 * zdtm/static/socket-tcp-nfconntrack(h)
 * zdtm/static/socket-tcp-syn-sent(uns)
 * zdtm/transition/maps007(unknown)
##################################### FAIL #####################################

I think failed tests related to iptables and sockets might be due to some disabled kernel configs. I haven't confirmed this yet, as compiling an Ubuntu kernel locally takes quite a bit of time on my side :(
I've also set up the cross-compile CI for RISC-V. As riscv64 is not an official stable arch in Debian yet (but it will be soon), I had to take a strange approach using Ubuntu ports for riscv64 cross-compiling. It now works with some dirty work in the Dockerfile.
The major problem right now is a critical bug related to ptrace and syscalls in riscv kernel. You can find more details here.
I'm going to finalize and send the patch (changing arch_do_signal_or_restart() to arm64 style) to LKML, but I'm not sure if this is a proper solution and how long it will take. Unfortunately, I cannot think of any workarounds (the orig_a0 approach also relies on kernel change, and it seems not reasonable according to the root cause). As it stands, without this fix, kernels <= 6.4 might only handle naive jobs, such as simple loops.
I'm planning to create a pull request soon, and I'll focus on organizing the commits and changes and fixing any potential issues (including the failed tests and CI setup). I'm really looking forward to your feedback and suggestions!

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.23%. Comparing base (cb39c62) to head (5f79f37).

❗ Current head 5f79f37 differs from pull request most recent head 91d3cfd. Consider uploading reports for the commit 91d3cfd to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##           criu-dev    #2234      +/-   ##
============================================
+ Coverage     70.21%   70.23%   +0.02%     
============================================
  Files           132      131       -1     
  Lines         34372    34370       -2     
============================================
+ Hits          24134    24141       +7     
+ Misses        10238    10229       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ancientmodern
Copy link
Contributor Author

ancientmodern commented Aug 1, 2023

Update: The zdtm/static/cmdlinenv00 test now passes after syncing AT_VECTOR_SIZE to the correct value (64) in the riscv Linux kernel, and zdtm/static/apparmor_stacking should be fixed with #2235 :)

@mihalicyn, I have a small question about running make lint locally. I'm experiencing many more errors with flake8 on my side, even after reverting to the version used by CI (flake8 == 5.0.3). Do you have any idea what might be causing this difference? Thanks!

haorong at haorong:/criu [riscv64-upstream]$ make lint
flake8 --version
5.0.3 (mccabe: 0.7.0, pycodestyle: 2.9.1, pyflakes: 2.5.0, tryceratops: 2.3.2) CPython 3.10.4 on Linux
flake8 --config=scripts/flake8.cfg test/zdtm.py
test/zdtm.py:492:13: TRY200 Use 'raise from' to specify exception cause
test/zdtm.py:550:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:590:17: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:648:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:979:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:981:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:983:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:998:21: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1276:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1489:17: TRY300 Consider moving this statement to an 'else' block
test/zdtm.py:1664:17: TRY201 Simply use 'raise' without specifying exception object again
test/zdtm.py:1687:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1700:17: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1713:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1722:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:1937:21: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2011:17: TRY002 Create your own exception
test/zdtm.py:2040:13: TRY002 Create your own exception
test/zdtm.py:2113:17: TRY201 Simply use 'raise' without specifying exception object again
test/zdtm.py:2230:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2363:13: TRY002 Create your own exception
test/zdtm.py:2363:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2368:13: TRY002 Create your own exception
test/zdtm.py:2368:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2374:13: TRY002 Create your own exception
test/zdtm.py:2374:13: TRY003 Avoid specifying long messages outside the exception class
test/zdtm.py:2628:9: TRY300 Consider moving this statement to an 'else' block
make: *** [Makefile:439: lint] Error 1

ancientmodern and others added 6 commits August 25, 2023 03:11
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
hack3ric added a commit to hack3ric/archriscv-packages that referenced this pull request Sep 23, 2023
Port upstream riscv64 support pull request: checkpoint-restore/criu#2234

Applies loongarch64 support as well so patches don't fail.
felixonmars pushed a commit to felixonmars/archriscv-packages that referenced this pull request Sep 23, 2023
Port upstream riscv64 support pull request: checkpoint-restore/criu#2234

Applies loongarch64 support as well so patches don't fail.
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@avagin avagin added no-auto-close Don't auto-close as a stale issue and removed stale-pr labels Oct 5, 2023
@ancientmodern
Copy link
Contributor Author

Really sorry for the delay. The past two months turned out to be more "fast-paced" than I had anticipated, and the riscv ubuntu VM I've been using for development got messed up after I installed a buggy kernel :( This led me to somewhat negatively sideline this PR and the corresponding kernel patch. Fortunately, I dug out an old VM snapshot from deep in my hard drive and successfully restored it to a dev-ready state today. I'll return to the PR as soon as my work schedule lightens up. Thanks for your patience and continued support! 🙏

dzhang28 pushed a commit to dzhang28/criu that referenced this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-auto-close Don't auto-close as a stale issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants