BUG at mm/slub.c (slab_alloc_node) 5.14.11-hardened1 #64

icasdri · 2021-10-20T06:32:06Z

I'm hitting a BUG_ON in slab_alloc_node on linux-hardened 5.14.11-hardened1

[54157.197925] ------------[ cut here ]------------
[54157.197930] kernel BUG at mm/slub.c:3035!
[54157.197939] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[54157.197944] CPU: 1 PID: 26863 Comm: updatedb Kdump: loaded Tainted: G        W         5.14.11-hardened1-2-hardened-debug-debug #1
[54157.197950] RIP: 0010:__kmalloc_node+0x427/0x460
[54157.197959] Code: 8b 78 08 8b 44 24 04 8d 4a 01 45 89 e1 41 b8 00 10 00 00 4c 89 fa 50 49 d3 e0 4c 89 e9 e8 01 65 f9 ff 5a e9 7e fe ff ff 0f 0b <0f> 0b 49 8b 46 08 f0 48 83 28 01 0f 85 91 fc ff ff 49 8b 46 08 4c
[54157.197963] RSP: 0018:ffffc9000288b870 EFLAGS: 00010286
[54157.197967] RAX: ffff8881451e9b0c RBX: 0000000000000dc0 RCX: ffff8881451e9b0c
[54157.197969] RDX: 00000000000000c0 RSI: 0000000000000000 RDI: 0000000000000000
[54157.197972] RBP: ffff888100041800 R08: 0101010101010101 R09: 0000000080140014
[54157.197974] R10: 000000000014a4d2 R11: ffffc9cac9c98b89 R12: 0000000000000dc0
[54157.197976] R13: 00000000000000a0 R14: 0000000000000000 R15: ffffffff812ea149
[54157.197979] FS:  00006602d747e600(0000) GS:ffff888257240000(0000) knlGS:0000000000000000
[54157.197982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54157.197984] CR2: 000010840c6e2000 CR3: 000000020c0d8001 CR4: 00000000001706e0
[54157.197987] Call Trace:
[54157.197991]  memcg_alloc_page_obj_cgroups+0x39/0x90
[54157.197997]  allocate_slab+0xdf/0x4c0
[54157.198004]  ___slab_alloc+0x3f3/0x5c0
[54157.198009]  ? __d_alloc+0x22/0x1e0
[54157.198014]  ? __d_alloc+0x22/0x1e0
[54157.198016]  __slab_alloc.constprop.0+0x52/0x90
[54157.198022]  ? __d_alloc+0x22/0x1e0
[54157.198025]  kmem_cache_alloc+0x367/0x3b0
[54157.198029]  __d_alloc+0x22/0x1e0
[54157.198032]  d_alloc+0x1b/0xa0
[54157.198036]  d_alloc_parallel+0x60/0x550
[54157.198042]  __lookup_slow+0x5c/0x140
[54157.198047]  walk_component+0x141/0x1b0
[54157.198052]  path_lookupat+0x5f/0x190
[54157.198056]  filename_lookup+0xc7/0x1d0
[54157.198063]  vfs_statx+0x86/0x140
[54157.198069]  __do_sys_newfstatat+0x47/0x80
[54157.198076]  do_syscall_64+0x66/0x90
[54157.198084]  ? do_syscall_64+0xe/0x90
[54157.198089]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[54157.198095] RIP: 0033:0x6602d739fd8e
[54157.198098] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 a9
[54157.198101] RSP: 002b:00007629721423b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[54157.198105] RAX: ffffffffffffffda RBX: 000006f7f47ee5f0 RCX: 00006602d739fd8e
[54157.198107] RDX: 0000762972142430 RSI: 000006f7f4841849 RDI: 00000000ffffff9c
[54157.198110] RBP: 000006f7f4841849 R08: 0000000000000003 R09: 000006f7ba2db740
[54157.198112] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000002
[54157.198114] R13: 0000000000000003 R14: 0000762972142610 R15: 0000000000000003

Task here was updatedb which from my understanding crawls the entire filesystem for indexing purposes (so it might be thrashing the slab allocator for all the path lookups).

slub.c:3035 is the BUG_ON in the following snippet near the end of slab_alloc_node

if (has_sanitize_verify(s) && object) {
    /* KASAN hasn't unpoisoned the object yet (this is done in the
     * post-alloc hook), so let's do it temporarily.
     */
    kasan_unpoison_object_data(s, object);
    BUG_ON(memchr_inv(object, 0, s->object_size));  // <---- slub.c:3035
    if (s->ctor)
        s->ctor(object);
    kasan_poison_object_data(s, object);
} else {
    init = slab_want_init_on_alloc(gfpflags, s);
}

I will try to repro in a VM so I can post coredump + debuginfo.

The text was updated successfully, but these errors were encountered:

commit 8b59b0a upstream. arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic. the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files. for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, #108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, #144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, #108] ; 0x6c e2859014 add r9, r5, #20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, #60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, #16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, #15 e7e36655 ubfx r6, r5, #12, #4 e205a00f and sl, r5, #15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, #16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, #16] e12fff30 blx r0 e356000f cm r6, #15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, #64 ; 0x40 ...... when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7) Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support") Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo <huangshaobo6@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anthraxx · 2022-03-25T18:15:12Z

Are you still seeing this problem? This should only happen if there were none cleared bytes, which indicate a corruption.

anthraxx · 2022-06-01T19:12:19Z

@icasdri are you still experiencing this issue?

[ Upstream commit d6352da ] '__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this option is disabled it becomes '__initdata', which means the data can be freed after the initialization phase. This annotation is obviously incorrect for the devlink net device notifier block which is still registered after the initialization phase [1]. Fix this crash by removing the '__net_initdata' annotation. [1] general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [anthraxx#1] PREEMPT SMP CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 anthraxx#64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 RIP: 0010:notifier_call_chain+0x58/0xc0 [...] Call Trace: <TASK> dev_set_mac_address+0x85/0x120 dev_set_mac_address_user+0x30/0x50 do_setlink+0x219/0x1270 rtnl_setlink+0xf7/0x1a0 rtnetlink_rcv_msg+0x142/0x390 netlink_rcv_skb+0x58/0x100 netlink_unicast+0x188/0x270 netlink_sendmsg+0x214/0x470 __sys_sendto+0x12f/0x1a0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: e93c937 ("devlink: change per-devlink netdev notifier to static one") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/netdev/600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com/ Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 6d41d4f ] BUG: KASAN: slab-use-after-free in xfrm_policy_inexact_list_reinsert+0xb6/0x430 Read of size 1 at addr ffff8881051f3bf8 by task ip/668 CPU: 2 PID: 668 Comm: ip Not tainted 6.5.0-rc5-00182-g25aa0bebba72-dirty #64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x72/0xa0 print_report+0xd0/0x620 kasan_report+0xb6/0xf0 xfrm_policy_inexact_list_reinsert+0xb6/0x430 xfrm_policy_inexact_insert_node.constprop.0+0x537/0x800 xfrm_policy_inexact_alloc_chain+0x23f/0x320 xfrm_policy_inexact_insert+0x6b/0x590 xfrm_policy_insert+0x3b1/0x480 xfrm_add_policy+0x23c/0x3c0 xfrm_user_rcv_msg+0x2d0/0x510 netlink_rcv_skb+0x10d/0x2d0 xfrm_netlink_rcv+0x49/0x60 netlink_unicast+0x3fe/0x540 netlink_sendmsg+0x528/0x970 sock_sendmsg+0x14a/0x160 ____sys_sendmsg+0x4fc/0x580 ___sys_sendmsg+0xef/0x160 __sys_sendmsg+0xf7/0x1b0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x73/0xdd The root cause is: cpu 0 cpu1 xfrm_dump_policy xfrm_policy_walk list_move_tail xfrm_add_policy ... ... xfrm_policy_inexact_list_reinsert list_for_each_entry_reverse if (!policy->bydst_reinsert) //read non-existent policy xfrm_dump_policy_done xfrm_policy_walk_done list_del(&walk->walk.all); If dump_one_policy() returns err (triggered by netlink socket), xfrm_policy_walk() will move walk initialized by socket to list net->xfrm.policy_all. so this socket becomes visible in the global policy list. The head *walk can be traversed when users add policies with different prefixlen and trigger xfrm_policy node merge. The issue can also be triggered by policy list traversal while rehashing and flushing policies. It can be fixed by skip such "policies" with walk.dead set to 1. Fixes: 9cf545e ("xfrm: policy: store inexact policies in a tree ordered by destination address") Fixes: 12a169e ("ipsec: Put dumpers on the dump list") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 6d41d4f upstream. BUG: KASAN: slab-use-after-free in xfrm_policy_inexact_list_reinsert+0xb6/0x430 Read of size 1 at addr ffff8881051f3bf8 by task ip/668 CPU: 2 PID: 668 Comm: ip Not tainted 6.5.0-rc5-00182-g25aa0bebba72-dirty #64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x72/0xa0 print_report+0xd0/0x620 kasan_report+0xb6/0xf0 xfrm_policy_inexact_list_reinsert+0xb6/0x430 xfrm_policy_inexact_insert_node.constprop.0+0x537/0x800 xfrm_policy_inexact_alloc_chain+0x23f/0x320 xfrm_policy_inexact_insert+0x6b/0x590 xfrm_policy_insert+0x3b1/0x480 xfrm_add_policy+0x23c/0x3c0 xfrm_user_rcv_msg+0x2d0/0x510 netlink_rcv_skb+0x10d/0x2d0 xfrm_netlink_rcv+0x49/0x60 netlink_unicast+0x3fe/0x540 netlink_sendmsg+0x528/0x970 sock_sendmsg+0x14a/0x160 ____sys_sendmsg+0x4fc/0x580 ___sys_sendmsg+0xef/0x160 __sys_sendmsg+0xf7/0x1b0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x73/0xdd The root cause is: cpu 0 cpu1 xfrm_dump_policy xfrm_policy_walk list_move_tail xfrm_add_policy ... ... xfrm_policy_inexact_list_reinsert list_for_each_entry_reverse if (!policy->bydst_reinsert) //read non-existent policy xfrm_dump_policy_done xfrm_policy_walk_done list_del(&walk->walk.all); If dump_one_policy() returns err (triggered by netlink socket), xfrm_policy_walk() will move walk initialized by socket to list net->xfrm.policy_all. so this socket becomes visible in the global policy list. The head *walk can be traversed when users add policies with different prefixlen and trigger xfrm_policy node merge. The issue can also be triggered by policy list traversal while rehashing and flushing policies. It can be fixed by skip such "policies" with walk.dead set to 1. Fixes: 9cf545e ("xfrm: policy: store inexact policies in a tree ordered by destination address") Fixes: 12a169e ("ipsec: Put dumpers on the dump list") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG at mm/slub.c (slab_alloc_node) 5.14.11-hardened1 #64

BUG at mm/slub.c (slab_alloc_node) 5.14.11-hardened1 #64

icasdri commented Oct 20, 2021

anthraxx commented Mar 25, 2022

anthraxx commented Jun 1, 2022

BUG at mm/slub.c (slab_alloc_node) 5.14.11-hardened1 #64

BUG at mm/slub.c (slab_alloc_node) 5.14.11-hardened1 #64

Comments

icasdri commented Oct 20, 2021

anthraxx commented Mar 25, 2022

anthraxx commented Jun 1, 2022