Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DETECT: Task: 'off' flag corruption for pid #329

Open
80kk opened this issue Apr 3, 2024 · 15 comments
Open

DETECT: Task: 'off' flag corruption for pid #329

80kk opened this issue Apr 3, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@80kk
Copy link

80kk commented Apr 3, 2024

I just started with LKRG by building it for Ubuntu 22.04 with 5.15.0-101-generic kernel. So far it seems to be working fine however I am getting everyday:

[Tue Apr  2 15:06:10 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 81271, name runc:[2:INIT]
[Tue Apr  2 15:06:10 2024] LKRG: ALERT: BLOCK: Task: Killing pid 81271, name runc:[2:INIT]

[Wed Apr  3 07:55:04 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 335824, name runc:[2:INIT]
[Wed Apr  3 07:55:04 2024] LKRG: ALERT: BLOCK: Task: Killing pid 335824, name runc:[2:INIT]

host is running Docker containers for Mailcow while none of the containers were restarted/killed it looks more like it prevent new container from starting?

@solardiz solardiz added the bug Something isn't working label Apr 4, 2024
@solardiz
Copy link
Contributor

solardiz commented Apr 4, 2024

Thank you for reporting this! It looks similar to #215, but we thought we had it fixed via #224. So if you're using our latest code, either the issue is completely different or our fix was somehow incomplete or inapplicable to some kernels.

@80kk
Copy link
Author

80kk commented Apr 5, 2024

@solardiz
I was looking for #215 first and indeed thought it's been resolved and this is something new. I've checked Docker logs and found only one match:

[Thu Apr  4 06:51:29 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 148205, name runc:[2:INIT]
[Thu Apr  4 06:51:29 2024] LKRG: ALERT: BLOCK: Task: Killing pid 148205, name runc:[2:INIT]
2024-04-04T06:51:30.104+02:00  common.go:121 ▶ ERROR [Job "dovecot_imapsync_runner" (c49c28b67a70)] StdOut: OCI runtime exec failed: exec failed: unable to start container process: read init-p: connection reset by peer: unknown
2024-04-04T06:51:30.104+02:00  common.go:121 ▶ ERROR [Job "dovecot_imapsync_runner" (c49c28b67a70)] Finished in "95.941657ms", failed: true, skipped: false, error: error non-zero exit code: 126

Unfortunately I can't find anything for:

[Thu Apr  4 08:39:39 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 173272, name runc:[2:INIT]
[Thu Apr  4 08:39:39 2024] LKRG: ALERT: BLOCK: Task: Killing pid 173272, name runc:[2:INIT]
[Thu Apr  4 09:00:12 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 177964, name runc:[2:INIT]
[Thu Apr  4 09:00:12 2024] LKRG: ALERT: BLOCK: Task: Killing pid 177964, name runc:[2:INIT]
[Thu Apr  4 09:00:12 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 177964, name runc:[2:INIT]
[Thu Apr  4 09:00:12 2024] LKRG: ALERT: BLOCK: Task: Killing pid 177964, name runc:[2:INIT]
[Thu Apr  4 14:29:06 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 255511, name runc:[2:INIT]
[Thu Apr  4 14:29:06 2024] LKRG: ALERT: BLOCK: Task: Killing pid 255511, name runc:[2:INIT]
[Thu Apr  4 14:35:40 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 257061, name runc:[2:INIT]
[Thu Apr  4 14:35:40 2024] LKRG: ALERT: BLOCK: Task: Killing pid 257061, name runc:[2:INIT]

What is interesting it is not failing always for dovecot_imapsync_runner:

2024-04-04T00:55:30.341+02:00  common.go:125 ▶ NOTICE [Job "dovecot_imapsync_runner" (ffa328e93e49)] Finished in "323.480958ms", failed: false, skipped: false, error: none

@Adam-pi3
Copy link
Collaborator

Adam-pi3 commented Apr 5, 2024

Can you please try uncommenting //#define P_LKRG_TASK_OFF_DEBUG in src/modules/print_log/p_lkrg_print_log.h?

@80kk
Copy link
Author

80kk commented Apr 7, 2024

Can you please try uncommenting //#define P_LKRG_TASK_OFF_DEBUG in src/modules/print_log/p_lkrg_print_log.h?

Here is the one with debug enabled, this time with different container:

2024-04-07T21:26:44.037+02:00  common.go:121 ▶ ERROR [Job "sogo_sessions" (2e019553b677)] StdOut: OCI runtime exec failed: runc did not terminate successfully: exit status 137: unknown
2024-04-07T21:26:44.037+02:00  common.go:121 ▶ ERROR [Job "sogo_sessions" (2e019553b677)] Finished in "29.638213ms", failed: true, skipped: false, error: error non-zero exit code: 126

dmesg:

[Sun Apr  7 21:26:42 2024] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 305793, name runc
[Sun Apr  7 21:26:42 2024] CPU: 3 PID: 305780 Comm: runc Tainted: G           OE     5.15.0-101-generic #111-Ubuntu
[Sun Apr  7 21:26:42 2024] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[Sun Apr  7 21:26:42 2024] Call Trace:
[Sun Apr  7 21:26:42 2024]  <TASK>
[Sun Apr  7 21:26:42 2024]  show_stack+0x52/0x5c
[Sun Apr  7 21:26:42 2024]  dump_stack_lvl+0x4a/0x63
[Sun Apr  7 21:26:42 2024]  dump_stack+0x10/0x16
[Sun Apr  7 21:26:42 2024]  p_ed_is_off_off.part.0+0x4a6/0x583 [lkrg]
[Sun Apr  7 21:26:42 2024]  p_set_ed_process_on.cold+0xe/0x1e [lkrg]
[Sun Apr  7 21:26:42 2024]  p_seccomp_ret+0x159/0x250 [lkrg]
[Sun Apr  7 21:26:42 2024]  ? __x64_sys_seccomp+0x18/0x20
[Sun Apr  7 21:26:42 2024]  __kretprobe_trampoline_handler+0xb4/0x140
[Sun Apr  7 21:26:42 2024]  trampoline_handler+0x41/0x60
[Sun Apr  7 21:26:42 2024]  __kretprobe_trampoline+0x2a/0x60
[Sun Apr  7 21:26:42 2024] RIP: 0010:__kretprobe_trampoline+0x0/0x60
[Sun Apr  7 21:26:42 2024] Code: 89 fc e8 e3 d7 01 00 4c 89 f2 4c 89 ee 4c 89 e7 44 0f b6 c0 31 c9 e8 8f 94 3b 00 41 5c 41 5d 41 5e 5d c3 cc cc cc cc cc cc cc <54> 9c 48 83 ec 18 57 56 52 51 50 41 50 41 51 41 52 41 53 53 55 41
[Sun Apr  7 21:26:42 2024] RSP: c390ff48:ffffacb5c390fe48 EFLAGS: 00000246
[Sun Apr  7 21:26:42 2024] RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000
[Sun Apr  7 21:26:42 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffacb5c390fde0
[Sun Apr  7 21:26:42 2024] RBP: ffffacb5c390fe48 R08: fffffffffffffff2 R09: 0000000000000000
[Sun Apr  7 21:26:42 2024] R10: ffffacb5c390fdd0 R11: 0000000000000000 R12: ffffacb5c390ff58
[Sun Apr  7 21:26:42 2024] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Sun Apr  7 21:26:42 2024] WARNING: kernel stack regs at 00000000e52a8d30 in runc:305780 has bad 'bp' value 0000000009276f68
[Sun Apr  7 21:26:42 2024] unwind stack type:1 next_sp:0000000000000000 mask:0x2 graph_idx:0
[Sun Apr  7 21:26:42 2024] 000000004f06a71e: ffffacb5c390fc30 (0xffffacb5c390fc30)
[Sun Apr  7 21:26:42 2024] 0000000067221f5d: ffffffff8f509c36 (show_trace_log_lvl+0x1ff/0x2ea)
[Sun Apr  7 21:26:42 2024] 00000000bf3c6b0a: ffffffff8e88f5da (__kretprobe_trampoline+0x2a/0x60)
[Sun Apr  7 21:26:42 2024] 000000008c639c2a: ffffacb5c390fe28 (0xffffacb5c390fe28)
[Sun Apr  7 21:26:42 2024] 000000006d9eb799: ffffffff8fdbddaa (.LC1+0x61d/0xae9)
[Sun Apr  7 21:26:42 2024] 0000000087139b99: 00000000c390fbd8 (0xc390fbd8)
[Sun Apr  7 21:26:42 2024] 00000000ff2d623c: 0000000000000002 (0x2)
[Sun Apr  7 21:26:42 2024] 0000000007c2e747: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 000000007795d9f1: ffffacb5c390c000 (0xffffacb5c390c000)
[Sun Apr  7 21:26:42 2024] 00000000e598bc3a: ffffacb5c3910000 (0xffffacb5c3910000)
[Sun Apr  7 21:26:42 2024] 00000000d860ab4d: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000008825e493: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 000000005128cfd8: ffffacb5c390c000 (0xffffacb5c390c000)
[Sun Apr  7 21:26:42 2024] 00000000dd16564d: ffffacb5c3910000 (0xffffacb5c3910000)
[Sun Apr  7 21:26:42 2024] 000000007c6ae32d: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000d74b392f: 0000000000000002 (0x2)
[Sun Apr  7 21:26:42 2024] 00000000623cf7d0: ffff9bfa4f04c8c0 (0xffff9bfa4f04c8c0)
[Sun Apr  7 21:26:42 2024] 00000000efb7970d: 0000010100000000 (0x10100000000)
[Sun Apr  7 21:26:42 2024] 00000000527918f0: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000003841a11f: ffffacb5c390fb48 (0xffffacb5c390fb48)
[Sun Apr  7 21:26:42 2024] 000000007e6f9865: ffffffff8e88f5b0 (elfcorehdr_read+0x40/0x40)
[Sun Apr  7 21:26:42 2024] 000000003c14e538: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000a10accc9: ffffacb5c390fda8 (0xffffacb5c390fda8)
[Sun Apr  7 21:26:42 2024] 00000000b5f830da: cd17869a0a6c1700 (0xcd17869a0a6c1700)
[Sun Apr  7 21:26:42 2024] 000000008bcee23f: 0000000000000046 (0x46)
[Sun Apr  7 21:26:42 2024] 0000000052f0fe85: ffff9bfa4f04c8c0 (0xffff9bfa4f04c8c0)
[Sun Apr  7 21:26:42 2024] 00000000d4a04093: ffffffff8fe327c7 (.LC2+0x181e/0x19ae)
[Sun Apr  7 21:26:42 2024] 0000000059ce5e94: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000005d2dfbe0: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 0000000087055647: ffffacb5c390fc50 (0xffffacb5c390fc50)
[Sun Apr  7 21:26:42 2024] 00000000cc31e566: ffffffff8f509de0 (show_stack+0x52/0x5c)
[Sun Apr  7 21:26:42 2024] 00000000e817e8a3: ffffffff8fe327c7 (.LC2+0x181e/0x19ae)
[Sun Apr  7 21:26:42 2024] 00000000993a2e2a: ffff9bfa6a22002c (0xffff9bfa6a22002c)
[Sun Apr  7 21:26:42 2024] 00000000f6e4ab3d: ffffacb5c390fc70 (0xffffacb5c390fc70)
[Sun Apr  7 21:26:42 2024] 0000000090ecf8aa: ffffffff8f55117a (dump_stack_lvl+0x4a/0x63)
[Sun Apr  7 21:26:42 2024] 000000008e087d06: ffff9bfa6a220000 (0xffff9bfa6a220000)
[Sun Apr  7 21:26:42 2024] 00000000b4d56958: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 0000000045a3b219: ffffacb5c390fc80 (0xffffacb5c390fc80)
[Sun Apr  7 21:26:42 2024] 00000000fd6f4d4a: ffffffff8f5511a3 (dump_stack+0x10/0x16)
[Sun Apr  7 21:26:42 2024] 000000002c3c0446: ffffacb5c390fcd8 (0xffffacb5c390fcd8)
[Sun Apr  7 21:26:42 2024] 00000000cdb91caa: ffffffffc0accdc8 (p_ed_is_off_off.part.0+0x4a6/0x583 [lkrg])
[Sun Apr  7 21:26:42 2024] 00000000a843ec5e: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 0000000027cbede7: ffffacb5c390fcd8 (0xffffacb5c390fcd8)
[Sun Apr  7 21:26:42 2024] 0000000020bfbbb9: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000007fb9bf79: ffff9bf94e8a48e8 (0xffff9bf94e8a48e8)
[Sun Apr  7 21:26:42 2024] 000000003f18b326: ffff9bfa4f04c8c0 (0xffff9bfa4f04c8c0)
[Sun Apr  7 21:26:42 2024] 00000000d5607dd6: ffff9bf94e8a48c0 (0xffff9bf94e8a48c0)
[Sun Apr  7 21:26:42 2024] 00000000264ebd4f: ffff9bfa6a220000 (0xffff9bfa6a220000)
[Sun Apr  7 21:26:42 2024] 00000000f2ff304b: ffffacb5c390fcf8 (0xffffacb5c390fcf8)
[Sun Apr  7 21:26:42 2024] 0000000022d85c76: ffffffffc0accf51 (p_set_ed_process_on.cold+0xe/0x1e [lkrg])
[Sun Apr  7 21:26:42 2024] 00000000865bbe16: ffff9bfa6a220000 (0xffff9bfa6a220000)
[Sun Apr  7 21:26:42 2024] 00000000c412f148: ffff9bfa6a240000 (0xffff9bfa6a240000)
[Sun Apr  7 21:26:42 2024] 000000008629e034: ffffacb5c390fd40 (0xffffacb5c390fd40)
[Sun Apr  7 21:26:42 2024] 00000000c439429b: ffffffffc0ac2f29 (p_seccomp_ret+0x159/0x250 [lkrg])
[Sun Apr  7 21:26:42 2024] 0000000077795c46: 0000000000000286 (0x286)
[Sun Apr  7 21:26:42 2024] 000000005ee44eda: fffffffffffffff2 (0xfffffffffffffff2)
[Sun Apr  7 21:26:42 2024] 00000000ae9aa695: ffff9bfa4f1cc710 (0xffff9bfa4f1cc710)
[Sun Apr  7 21:26:42 2024] 0000000050ac6d01: ffff9bfa4f1cc710 (0xffff9bfa4f1cc710)
[Sun Apr  7 21:26:42 2024] 00000000644c7d73: ffffffff8e9e1bc8 (__x64_sys_seccomp+0x18/0x20)
[Sun Apr  7 21:26:42 2024] 0000000037b59127: ffffacb5c390fe40 (0xffffacb5c390fe40)
[Sun Apr  7 21:26:42 2024] 000000003a951b06: ffffacb5c390fda8 (0xffffacb5c390fda8)
[Sun Apr  7 21:26:42 2024] 0000000030836a8e: ffffacb5c390fd80 (0xffffacb5c390fd80)
[Sun Apr  7 21:26:42 2024] 00000000fb878233: ffffffff8e9cf174 (__kretprobe_trampoline_handler+0xb4/0x140)
[Sun Apr  7 21:26:42 2024] 000000002f4f262e: ffffffff907dac40 (kprobe_exceptions_nb+0x20/0x20)
[Sun Apr  7 21:26:42 2024] 000000004f23a05c: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000886d85fb: ffffacb5c390fda8 (0xffffacb5c390fda8)
[Sun Apr  7 21:26:42 2024] 00000000deaad7c0: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000270b6b3c: ffffacb5c390fd98 (0xffffacb5c390fd98)
[Sun Apr  7 21:26:42 2024] 0000000090781241: ffffffff8e88fde1 (trampoline_handler+0x41/0x60)
[Sun Apr  7 21:26:42 2024] 000000004233f907: ffffacb5c390ff58 (0xffffacb5c390ff58)
[Sun Apr  7 21:26:42 2024] 0000000035adfde5: ffffacb5c390fda9 (0xffffacb5c390fda9)
[Sun Apr  7 21:26:42 2024] 0000000068a2a2a6: ffffffff8e88f5da (__kretprobe_trampoline+0x2a/0x60)
[Sun Apr  7 21:26:42 2024] 00000000e52a8d30: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000c2717a66: ffffacb5c390ff58 (0xffffacb5c390ff58)
[Sun Apr  7 21:26:42 2024] 00000000976ab0bb: ffffacb5c390fe48 (0xffffacb5c390fe48)
[Sun Apr  7 21:26:42 2024] 000000001b34ea33: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000000e2c725f: ffffacb5c390fdd0 (0xffffacb5c390fdd0)
[Sun Apr  7 21:26:42 2024] 000000008bf572c7: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 0000000017d99de9: fffffffffffffff2 (0xfffffffffffffff2)
[Sun Apr  7 21:26:42 2024] 000000004ed19174: fffffffffffffff2 (0xfffffffffffffff2)
[Sun Apr  7 21:26:42 2024] 00000000cacd4cbf: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 0000000028b6bd38: ffffacb5c390fde0 (0xffffacb5c390fde0)
[Sun Apr  7 21:26:42 2024] 00000000d919959d: ffffffffffffffff (0xffffffffffffffff)
[Sun Apr  7 21:26:42 2024] 000000009295a296: ffffffff8e88f5b0 (elfcorehdr_read+0x40/0x40)
[Sun Apr  7 21:26:42 2024] 000000009e351dd1: 0000000000000010 (0x10)
[Sun Apr  7 21:26:42 2024] 00000000da958d98: 0000000000000246 (0x246)
[Sun Apr  7 21:26:42 2024] 00000000145ea44c: ffffacb5c390fe48 (0xffffacb5c390fe48)
[Sun Apr  7 21:26:42 2024] 0000000009276f68: ffffacb5c390ff48 (0xffffacb5c390ff48)
[Sun Apr  7 21:26:42 2024] 000000005fe4a743: ffffffff8f5baa9c (do_syscall_64+0x5c/0xc0)
[Sun Apr  7 21:26:42 2024] 000000004b580919: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000dae81c94: ffffacb5c390ff58 (0xffffacb5c390ff58)
[Sun Apr  7 21:26:42 2024] 000000004e9a97e8: ffffacb5c390fef0 (0xffffacb5c390fef0)
[Sun Apr  7 21:26:42 2024] 00000000c2e1f82f: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000004d894f8f: ffffffffffffffea (0xffffffffffffffea)
[Sun Apr  7 21:26:42 2024] 00000000c447b10c: ffffffffffffffea (0xffffffffffffffea)
[Sun Apr  7 21:26:42 2024] 00000000476efdb6: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 000000009fadf62f: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 00000000ecbd5eb4: ffffacb5c390fed8 (0xffffacb5c390fed8)
[Sun Apr  7 21:26:42 2024] 000000005650ebc2: ffffffff8e96eca7 (exit_to_user_mode_prepare+0x37/0xb0)
[Sun Apr  7 21:26:42 2024] 000000008774b022: ffffacb5c390ff58 (0xffffacb5c390ff58)
[Sun Apr  7 21:26:42 2024] 00000000861e7f35: ffffacb5c390fef0 (0xffffacb5c390fef0)
[Sun Apr  7 21:26:42 2024] 00000000b9d9a719: ffffffff8f5bef45 (syscall_exit_to_user_mode+0x35/0x50)
[Sun Apr  7 21:26:42 2024] 00000000bed9d6b2: ffffffff8e9e1bc8 (__x64_sys_seccomp+0x18/0x20)
[Sun Apr  7 21:26:42 2024] 000000003db8927f: ffffacb5c390ff48 (0xffffacb5c390ff48)
[Sun Apr  7 21:26:42 2024] 00000000ad20b5d5: ffffffff8f5baaa9 (do_syscall_64+0x69/0xc0)
[Sun Apr  7 21:26:42 2024] 0000000017998500: ffffffff8f5befd7 (irqentry_exit_to_user_mode+0x17/0x20)
[Sun Apr  7 21:26:42 2024] 0000000050088124: ffffacb5c390ff18 (0xffffacb5c390ff18)
[Sun Apr  7 21:26:42 2024] 00000000149cfd92: ffffffff8f5beffd (irqentry_exit+0x1d/0x30)
[Sun Apr  7 21:26:42 2024] 000000002d711f93: ffffacb5c390ff48 (0xffffacb5c390ff48)
[Sun Apr  7 21:26:42 2024] 000000007ad24a78: ffffffff8f5be9e9 (exc_page_fault+0x89/0x170)
[Sun Apr  7 21:26:42 2024] 0000000083f49c3c: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 00000000dcffbe5b: ffffffff8f6000da (entry_SYSCALL_64_after_hwframe+0x62/0xcc)
[Sun Apr  7 21:26:42 2024] 000000001e1f3a34: 000000000000000a (0xa)
[Sun Apr  7 21:26:42 2024] 00000000582fed39: 000000c0000061a0 (0xc0000061a0)
[Sun Apr  7 21:26:42 2024] 00000000aecbb080: 0000000000099596 (0x99596)
[Sun Apr  7 21:26:42 2024] 000000000d3f177b: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 00000000caff80f0: 000000c000056d58 (0xc000056d58)
[Sun Apr  7 21:26:42 2024] 00000000748d5157: 000000c000056dc8 (0xc000056dc8)
[Sun Apr  7 21:26:42 2024] 00000000fff312c2: 0000000000000246 (0x246)
[Sun Apr  7 21:26:42 2024] 0000000011f57f8d: 0000000000000004 (0x4)
[Sun Apr  7 21:26:42 2024] 00000000a2ac5c54: 000000c000057000 (0xc000057000)
[Sun Apr  7 21:26:42 2024] 0000000071f3dd57: 000000c000056dc8 (0xc000056dc8)
[Sun Apr  7 21:26:42 2024] 00000000f7587a2c: ffffffffffffffda (0xffffffffffffffda)
[Sun Apr  7 21:26:42 2024] 00000000c28b704d: 00007efc8bf6288d (0x7efc8bf6288d)
[Sun Apr  7 21:26:42 2024] 000000003f71b2bc: 0000000000000000 ...
[Sun Apr  7 21:26:42 2024] 0000000033aaee75: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 00000000b44d136c: 0000000000000001 (0x1)
[Sun Apr  7 21:26:42 2024] 00000000009e1c66: 000000000000013d (0x13d)
[Sun Apr  7 21:26:42 2024] 00000000e9a60638: 00007efc8bf6288d (0x7efc8bf6288d)
[Sun Apr  7 21:26:42 2024] 0000000057b9c273: 0000000000000033 (0x33)
[Sun Apr  7 21:26:42 2024] 00000000cb17aaa7: 0000000000000246 (0x246)
[Sun Apr  7 21:26:42 2024] 00000000d2b4f2ba: 00007fff3b3d1828 (0x7fff3b3d1828)
[Sun Apr  7 21:26:42 2024] 000000000bfc461f: 000000000000002b (0x2b)
[Sun Apr  7 21:26:42 2024]  ? do_syscall_64+0x5c/0xc0
[Sun Apr  7 21:26:42 2024]  ? exit_to_user_mode_prepare+0x37/0xb0
[Sun Apr  7 21:26:42 2024]  ? syscall_exit_to_user_mode+0x35/0x50
[Sun Apr  7 21:26:42 2024]  ? __x64_sys_seccomp+0x18/0x20
[Sun Apr  7 21:26:42 2024]  ? do_syscall_64+0x69/0xc0
[Sun Apr  7 21:26:42 2024]  ? irqentry_exit_to_user_mode+0x17/0x20
[Sun Apr  7 21:26:42 2024]  ? irqentry_exit+0x1d/0x30
[Sun Apr  7 21:26:42 2024]  ? exc_page_fault+0x89/0x170
[Sun Apr  7 21:26:42 2024]  ? entry_SYSCALL_64_after_hwframe+0x62/0xcc
[Sun Apr  7 21:26:42 2024]  </TASK>
[Sun Apr  7 21:26:42 2024] LKRG: ALERT: BLOCK: Task: Killing pid 305793, name runc

@Adam-pi3
Copy link
Collaborator

Adam-pi3 commented Apr 8, 2024

Thanks @80kk , could you also enable log_level to level 4 under P_LKRG_TASK_OFF_DEBUG compilation?

@80kk
Copy link
Author

80kk commented Apr 8, 2024

Thanks @80kk , could you also enable log_level to level 4 under P_LKRG_TASK_OFF_DEBUG compilation?

How can I do this? The only log_level occurrence I found in this file is in:

// Signature in logs...
#define P_LKRG_SIGNATURE "LKRG: "

#define P_LOG_MIN   0
#define P_LOG_ALERT 0
#define P_LOG_ALIVE 1
#define P_LOG_FAULT 2
#define P_LOG_ISSUE 3
#define P_LOG_WATCH 4
#define P_LOG_DEBUG 5
#define P_LOG_FLOOD 6
#define P_LOG_MAX   6

#define P_LOG_STATE (0x10 | P_LOG_ALIVE)
#define P_LOG_DYING (0x20 | P_LOG_ALIVE)
#define P_LOG_FATAL (0x30 | P_LOG_FAULT)

#define p_print_log(p_level, p_fmt, p_args...)                                             \
({                                                                                         \
   int p_print_ret = 0;                                                                    \
                                                                                           \
   if (p_level == P_LOG_ALERT)                                                             \
      p_print_ret = printk(KERN_CRIT    P_LKRG_SIGNATURE "ALERT: " p_fmt "\n", ## p_args); \
   else if (P_CTRL(p_log_level) >= (p_level & 7))                                          \
   switch (p_level) {                                                                      \
   case P_LOG_ALIVE:                                                                       \

@solardiz
Copy link
Contributor

solardiz commented Apr 8, 2024

@80kk You don't need to patch anything to adjust log_level - we have a sysctl and a module parameter of that name, so please use one of those. This is documented in README. Thank you!

@80kk
Copy link
Author

80kk commented Apr 8, 2024

@80kk You don't need to patch anything to adjust log_level - we have a sysctl and a module parameter of that name, so please use one of those. This is documented in README. Thank you!

Thanks. I misunderstood @Adam-pi3 's request.

@80kk
Copy link
Author

80kk commented Apr 9, 2024

Here is the call trace with log_level set to 4:

Apr  8 21:35:44 mail kernel: [159823.064647] LKRG: WATCH: Inserting pid 682842
Apr  8 21:35:44 mail kernel: [159823.065470] LKRG: WATCH: Updating pid 682843
Apr  8 21:35:44 mail kernel: [159823.065474] LKRG: WATCH: Inserting pid 682843
Apr  8 21:35:44 mail kernel: [159823.065495] LKRG: WATCH: Updating pid 676256
Apr  8 21:35:44 mail kernel: [159823.065531] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 682843, name runc:[2:INIT]
Apr  8 21:35:44 mail kernel: [159823.065562] LKRG: WATCH: 'off' flag[0x0] (normalization via 0x1a15583f3ed23d1)
Apr  8 21:35:44 mail kernel: [159823.065564] LKRG: WATCH: OFF debug: normalization[0x1a15583f3ed23d1] cookie[0xa987d5b5b109859f]
Apr  8 21:35:44 mail kernel: [159823.065566] LKRG: WATCH: Process[682843 | runc:[2:INIT]] Parent[682808 | runc] has TSYNC[0] and [1] entries:
Apr  8 21:35:44 mail kernel: [159823.065569] LKRG: WATCH:  => caller[p_seccomp_ret (TSYNC child)] action[OFF] old_off[0x1a15583f3ed23d1] debug_val[1]
Apr  8 21:35:44 mail kernel: [159823.065572] CPU: 2 PID: 682836 Comm: runc:[2:INIT] Tainted: G           OE     5.15.0-101-generic #111-Ubuntu
Apr  8 21:35:44 mail kernel: [159823.065576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Apr  8 21:35:44 mail kernel: [159823.065578] Call Trace:
Apr  8 21:35:44 mail kernel: [159823.065580]  <TASK>
Apr  8 21:35:44 mail kernel: [159823.065582]  show_stack+0x52/0x5c
Apr  8 21:35:44 mail kernel: [159823.065594]  dump_stack_lvl+0x4a/0x63
Apr  8 21:35:44 mail kernel: [159823.065599]  dump_stack+0x10/0x16
Apr  8 21:35:44 mail kernel: [159823.065603]  p_ed_is_off_off.part.0+0x4a6/0x583 [lkrg]
Apr  8 21:35:44 mail kernel: [159823.065622]  p_set_ed_process_on.cold+0xe/0x1e [lkrg]
Apr  8 21:35:44 mail kernel: [159823.065635]  p_seccomp_ret+0x159/0x250 [lkrg]
Apr  8 21:35:44 mail kernel: [159823.065648]  ? __x64_sys_seccomp+0x18/0x20
Apr  8 21:35:44 mail kernel: [159823.065652]  __kretprobe_trampoline_handler+0xb4/0x140
Apr  8 21:35:44 mail kernel: [159823.065656]  trampoline_handler+0x41/0x60
Apr  8 21:35:44 mail kernel: [159823.065659]  __kretprobe_trampoline+0x2a/0x60
Apr  8 21:35:44 mail kernel: [159823.065661] RIP: 0010:__kretprobe_trampoline+0x0/0x60
Apr  8 21:35:44 mail kernel: [159823.065664] Code: 89 fc e8 e3 d7 01 00 4c 89 f2 4c 89 ee 4c 89 e7 44 0f b6 c0 31 c9 e8 8f 94 3b 00 41 5c 41 5d 41 5e 5d c3 cc cc cc cc cc cc cc <54> 9c 48 83 ec 18 57 56 52 51 50 41 50 41 51 41 52 41 53 53 55 41
Apr  8 21:35:44 mail kernel: [159823.065667] RSP: c39cff48:ffffacb5c39cfeb8 EFLAGS: 00000246
Apr  8 21:35:44 mail kernel: [159823.065670] RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000
Apr  8 21:35:44 mail kernel: [159823.065672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffacb5c39cfe50
Apr  8 21:35:44 mail kernel: [159823.065674] RBP: ffffacb5c39cfeb8 R08: fffffffffffffff2 R09: 0000000000000000
Apr  8 21:35:44 mail kernel: [159823.065675] R10: ffffacb5c39cfe40 R11: 0000000000000000 R12: ffffacb5c39cff58
Apr  8 21:35:44 mail kernel: [159823.065677] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Apr  8 21:35:44 mail kernel: [159823.065680]  ? do_syscall_64+0x5c/0xc0
Apr  8 21:35:44 mail kernel: [159823.065684]  ? do_user_addr_fault+0x1e7/0x670
Apr  8 21:35:44 mail kernel: [159823.065688]  ? exit_to_user_mode_prepare+0x37/0xb0
Apr  8 21:35:44 mail kernel: [159823.065694]  ? irqentry_exit_to_user_mode+0x17/0x20
Apr  8 21:35:44 mail kernel: [159823.065698]  ? irqentry_exit+0x1d/0x30
Apr  8 21:35:44 mail kernel: [159823.065702]  ? exc_page_fault+0x89/0x170
Apr  8 21:35:44 mail kernel: [159823.065705]  ? entry_SYSCALL_64_after_hwframe+0x62/0xcc
Apr  8 21:35:44 mail kernel: [159823.065710]  </TASK>
Apr  8 21:35:44 mail kernel: [159823.065711] LKRG: ALERT: BLOCK: Task: Killing pid 682843, name runc:[2:INIT]
Apr  8 21:35:44 mail kernel: [159823.065800] LKRG: WATCH: Removing pid 682836
Apr  8 21:35:44 mail kernel: [159823.065893] LKRG: WATCH: Removing pid 682839
Apr  8 21:35:44 mail kernel: [159823.065914] LKRG: WATCH: Removing pid 682838

and Docker container log:

2024-04-08T21:35:44.084+02:00  common.go:121 ▶ ERROR [Job "dovecot_imapsync_runner" (290122202a13)] StdOut: OCI runtime exec failed: exec failed: unable to start container process: read init-p: connection reset by peer: unknown
2024-04-08T21:35:44.085+02:00  common.go:121 ▶ ERROR [Job "dovecot_imapsync_runner" (290122202a13)] Finished in "79.847734ms", failed: true, skipped: false, error: error non-zero exit code: 126

@80kk
Copy link
Author

80kk commented Apr 9, 2024

I don't know if that matters but as you probably already noticed this is a VM running on Proxmox. Underlying hardware is Dell PowerEdge R320.

@Adam-pi3
Copy link
Collaborator

Sorry for late reply. I tried to repro your issue under VmWare:

Distributor ID:	Ubuntu
Description:	Ubuntu 24.04 LTS
Release:	24.04
Codename:	noble

but under the kernel 6.8.0-31-generic and I install docker, LXD and docker compose. I run mailcow via this instructions:
https://docs.mailcow.email/getstarted/install/#initialize-mailcow

and everything works fine. Is there anything specific to repro it?

@80kk
Copy link
Author

80kk commented May 15, 2024

Well, as I wrote in my first post:

Ubuntu 22.04 with 5.15.0-101-generic kernel

There was no 24.04 released at that time and hypervisor is Proxmox but I don't think that this is the factor. I will probably upgrade to 24.04 during this weekend and update the ticket.

@80kk
Copy link
Author

80kk commented May 19, 2024

Unfortunately there is no official way for upgrading to 24.04 until 24.04.1 will be released. If you think this issue is resolved in 6.8 kernel then feel free and close this ticket.

@solardiz
Copy link
Contributor

Even if the issue is resolved or otherwise avoided in 6.8, we may still care to fix it for older kernels. LKRG supports a wide range of kernel versions.

@Adam-pi3
Copy link
Collaborator

I spent some time to do the same test on:

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Under the kernel 5.15.0-101-generic and I do not see that issue. I left all the containers running over the night and none of the FP was detected:

$ docker compose up -d
[+] Running 32/32
 ✔ Network mailcowdockerized_mailcow-network              Created                                                            0.4s 
 ✔ Volume "mailcowdockerized_vmail-vol-1"                 Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_vmail-index-vol-1"           Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_mysql-vol-1"                 Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_mysql-socket-vol-1"          Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_sogo-web-vol-1"              Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_clamd-db-vol-1"              Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_sogo-userdata-backup-vol-1"  Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_postfix-vol-1"               Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_crypt-vol-1"                 Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_redis-vol-1"                 Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_solr-vol-1"                  Created                                                            0.0s 
 ✔ Volume "mailcowdockerized_rspamd-vol-1"                Created                                                            0.0s 
 ✔ Container mailcowdockerized-netfilter-mailcow-1        Started                                                            0.1s 
 ✔ Container mailcowdockerized-memcached-mailcow-1        Started                                                            0.1s 
 ✔ Container mailcowdockerized-dockerapi-mailcow-1        Started                                                            0.1s 
 ✔ Container mailcowdockerized-unbound-mailcow-1          Healthy                                                            0.1s 
 ✔ Container mailcowdockerized-sogo-mailcow-1             Started                                                            0.1s 
 ✔ Container mailcowdockerized-olefy-mailcow-1            Started                                                            0.1s 
 ✔ Container mailcowdockerized-clamd-mailcow-1            Started                                                            0.0s 
 ✔ Container mailcowdockerized-redis-mailcow-1            Started                                                            0.1s 
 ✔ Container mailcowdockerized-mysql-mailcow-1            Started                                                            0.0s 
 ✔ Container mailcowdockerized-solr-mailcow-1             Started                                                            0.1s 
 ✔ Container mailcowdockerized-dovecot-mailcow-1          Started                                                            0.0s 
 ✔ Container mailcowdockerized-postfix-mailcow-1          Started                                                            0.0s 
 ✔ Container mailcowdockerized-ofelia-mailcow-1           Started                                                            0.0s 
 ✔ Container mailcowdockerized-rspamd-mailcow-1           Started                                                            0.0s 
 ✔ Container mailcowdockerized-php-fpm-mailcow-1          Started                                                            0.0s 
 ✔ Container mailcowdockerized-nginx-mailcow-1            Started                                                            0.0s 
 ✔ Container mailcowdockerized-acme-mailcow-1             Started                                                            0.0s 
 ✔ Container mailcowdockerized-watchdog-mailcow-1         Started                                                            0.0s 
 ✔ Container mailcowdockerized-ipv6nat-mailcow-1          Started             

In the kernel logs I can see that LKRG runs fine:

[Sat May 18 23:34:52 2024] lkrg: loading out-of-tree module taints kernel.
[Sat May 18 23:34:52 2024] lkrg: module verification failed: signature and/or required key missing - tainting kernel
[Sat May 18 23:34:52 2024] LKRG: ALIVE: Loading LKRG
[Sat May 18 23:34:52 2024] Freezing user space processes ... (elapsed 0.004 seconds) done.
[Sat May 18 23:34:52 2024] OOM killer disabled.
[Sat May 18 23:34:53 2024] LKRG: ALIVE: LKRG initialized successfully
[Sat May 18 23:34:53 2024] OOM killer enabled.
[Sat May 18 23:34:53 2024] Restarting tasks ... done.

No other logs related to LKRG. However, I have a question @80kk , do you see in the logs something similar to those messages?

LKRG: ISSUE: [kretprobe] register_kretprobe() for <ovl_dentry_is_whiteout> failed! [err=-2]
LKRG: ISSUE: Can't hook 'ovl_dentry_is_whiteout'. This is expected when OverlayFS is not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants