Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly kernel panic when using qemu-nbd #431

Open
kvaps opened this issue May 21, 2018 · 7 comments
Open

Randomly kernel panic when using qemu-nbd #431

kvaps opened this issue May 21, 2018 · 7 comments

Comments

@kvaps
Copy link

kvaps commented May 21, 2018

Hi, we are tesing sheepdog now with qemu-nbd driver.

  1. We are created one new VDI with replicas 2
  2. Then attached, created fs and mounted it like:
qemu-nbd -c /dev/nbd0 -f raw sheepdog:test
mkfs.ext4 /dev/nbd0
mount /dev/nbd0 /mnt
  1. Then we run fio test on it, and try to reboot one of other node, which contains this VDI
  2. We've got the next kernel panic after a while:
May 21 11:22:59 m1c5 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 7000. Sending cookies.  Check SNMP counters.
May 21 11:23:17 m1c5 kernel: block nbd0: Connection timed out
May 21 11:24:34 m1c5 kernel: sheep[10951]: segfault at b0 ip 0000000000419a67 sp 00007fc43ef7b4f0 error 4 in sheep[400000+5a000]
May 21 11:24:34 m1c5 kernel: sheep[10462]: segfault at b0 ip 0000000000419a67 sp 00007fc2c13ea4f0 error 4 in sheep[400000+5a000]
May 21 11:25:09 m1c5 kernel: INFO: task kworker/u16:1:14375 blocked for more than 120 seconds.
May 21 11:25:09 m1c5 kernel:       Tainted: G           OE   4.13.0-36-generic #40~16.04.1-Ubuntu
May 21 11:25:09 m1c5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 11:25:09 m1c5 kernel: kworker/u16:1   D    0 14375      2 0x00000000
May 21 11:25:09 m1c5 kernel: Workqueue: writeback wb_workfn (flush-43:0)
May 21 11:25:09 m1c5 kernel: Call Trace:
May 21 11:25:09 m1c5 kernel:  __schedule+0x3d6/0x8b0
May 21 11:25:09 m1c5 kernel:  schedule+0x36/0x80
May 21 11:25:09 m1c5 kernel:  io_schedule+0x16/0x40
May 21 11:25:09 m1c5 kernel:  wbt_wait+0x2ad/0x3a0
May 21 11:25:09 m1c5 kernel:  ? wait_woken+0x80/0x80
May 21 11:25:09 m1c5 kernel:  blk_mq_make_request+0x105/0x5f0
May 21 11:25:09 m1c5 kernel:  generic_make_request+0x12a/0x300
May 21 11:25:09 m1c5 kernel:  submit_bio+0x73/0x150
May 21 11:25:09 m1c5 kernel:  ? submit_bio+0x73/0x150
May 21 11:25:09 m1c5 kernel:  ? __test_set_page_writeback+0x191/0x2f0
May 21 11:25:09 m1c5 kernel:  ext4_io_submit+0x4c/0x60
May 21 11:25:09 m1c5 kernel:  ext4_bio_write_page+0x251/0x4e0
May 21 11:25:09 m1c5 kernel:  mpage_submit_page+0x58/0x70
May 21 11:25:09 m1c5 kernel:  mpage_map_and_submit_buffers+0x156/0x290
May 21 11:25:09 m1c5 kernel:  ext4_writepages+0x873/0xe40
May 21 11:25:09 m1c5 kernel:  ? mlx4_comm_cmd+0x281/0x350 [mlx4_core]
May 21 11:25:09 m1c5 kernel:  ? dma_pool_free+0xa3/0xd0
May 21 11:25:09 m1c5 kernel:  do_writepages+0x1f/0x70
May 21 11:25:09 m1c5 kernel:  ? do_writepages+0x1f/0x70
May 21 11:25:09 m1c5 kernel:  __writeback_single_inode+0x45/0x330
May 21 11:25:09 m1c5 kernel:  writeback_sb_inodes+0x26a/0x600
May 21 11:25:09 m1c5 kernel:  __writeback_inodes_wb+0x92/0xc0
May 21 11:25:09 m1c5 kernel:  wb_writeback+0x274/0x330
May 21 11:25:09 m1c5 kernel:  wb_workfn+0xb4/0x3b0
May 21 11:25:09 m1c5 kernel:  ? wb_workfn+0xb4/0x3b0
May 21 11:25:09 m1c5 kernel:  ? __schedule+0x3de/0x8b0
May 21 11:25:09 m1c5 kernel:  process_one_work+0x15b/0x410
May 21 11:25:09 m1c5 kernel:  worker_thread+0x4b/0x460
May 21 11:25:09 m1c5 kernel:  kthread+0x10c/0x140
May 21 11:25:09 m1c5 kernel:  ? process_one_work+0x410/0x410
May 21 11:25:09 m1c5 kernel:  ? kthread_create_on_node+0x70/0x70
May 21 11:25:09 m1c5 kernel:  ret_from_fork+0x35/0x40
May 21 11:25:09 m1c5 kernel: INFO: task jbd2/nbd0-8:1446 blocked for more than 120 seconds.
May 21 11:25:09 m1c5 kernel:       Tainted: G           OE   4.13.0-36-generic #40~16.04.1-Ubuntu
May 21 11:25:09 m1c5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 11:25:09 m1c5 kernel: jbd2/nbd0-8     D    0  1446      2 0x00000000
May 21 11:25:09 m1c5 kernel: Call Trace:
May 21 11:25:09 m1c5 kernel:  __schedule+0x3d6/0x8b0
May 21 11:25:09 m1c5 kernel:  schedule+0x36/0x80
May 21 11:25:09 m1c5 kernel:  io_schedule+0x16/0x40
May 21 11:25:09 m1c5 kernel:  wait_on_page_bit_common+0xf8/0x190
May 21 11:25:09 m1c5 kernel:  ? page_cache_tree_insert+0xb0/0xb0
May 21 11:25:09 m1c5 kernel:  __filemap_fdatawait_range+0x114/0x180
May 21 11:25:09 m1c5 kernel:  filemap_fdatawait_keep_errors+0x27/0x50
May 21 11:25:09 m1c5 kernel:  jbd2_journal_commit_transaction+0x6b7/0x1720
May 21 11:25:09 m1c5 kernel:  ? dequeue_task_fair+0x4c1/0x670
May 21 11:25:09 m1c5 kernel:  kjournald2+0xd2/0x260
May 21 11:25:09 m1c5 kernel:  ? kjournald2+0xd2/0x260
May 21 11:25:09 m1c5 kernel:  ? wait_woken+0x80/0x80
May 21 11:25:09 m1c5 kernel:  kthread+0x10c/0x140
May 21 11:25:09 m1c5 kernel:  ? commit_timeout+0x10/0x10
May 21 11:25:09 m1c5 kernel:  ? kthread_create_on_node+0x70/0x70
May 21 11:25:09 m1c5 kernel:  ret_from_fork+0x35/0x40
May 21 11:25:24 m1c5 kernel: block nbd0: shutting down sockets
May 21 11:25:24 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75972352
May 21 11:25:24 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36251369472 size 8388608 starting block 9496576)
May 21 11:25:24 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496320
May 21 11:25:24 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496321
May 21 11:25:24 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496322
May 21 11:25:24 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496323
May 21 11:25:24 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496324
May 21 11:25:25 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496325
May 21 11:25:25 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496326
May 21 11:25:25 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496327
May 21 11:25:25 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496328
May 21 11:25:25 m1c5 kernel: Buffer I/O error on device nbd0, logical block 9496329
May 21 11:25:25 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:25 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75973120
May 21 11:25:25 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:25 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75973376
May 21 11:25:25 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:25 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75973632
May 21 11:25:25 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:25 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75973888
May 21 11:25:25 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75974144
May 21 11:25:26 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75974400
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36251369472 size 8388608 starting block 9496832)
May 21 11:25:26 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75974656
May 21 11:25:26 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75974912
May 21 11:25:26 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: print_req_error: I/O error, dev nbd0, sector 75975168
May 21 11:25:26 m1c5 kernel: block nbd0: Connection timed out
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36251369472 size 8388608 starting block 9497600)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36259758080 size 6295552 starting block 9497856)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36259758080 size 6295552 starting block 9498112)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36259758080 size 6295552 starting block 9498368)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36259758080 size 6295552 starting block 9498624)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36259758080 size 6295552 starting block 9498880)
May 21 11:25:26 m1c5 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:322: I/O error 10 writing to inode 12 (offset 36251369472 size 8388608 starting block 9497088)
May 21 11:25:26 m1c5 kernel: EXT4-fs error (device nbd0): __ext4_get_inode_loc:4570: inode #12: block 1057: comm fio: unable to read itable block
May 21 11:25:26 m1c5 kernel: ------------[ cut here ]------------
May 21 11:25:26 m1c5 kernel: kernel BUG at /build/linux-hwe-4GXcua/linux-hwe-4.13.0/fs/buffer.c:3096!
May 21 11:25:26 m1c5 kernel: invalid opcode: 0000 [#1] SMP NOPTI
May 21 11:25:31 m1c5 kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 br_netfilter veth beegfs(OE) dummy xt_conntrack nf_conntrack_netlink xt_nat xt_tcpudp xt_recent ip_set nfnetlink xt_addrtype ip_vs ipt_MASQUERADE nf_nat_masquerade_i
May 21 11:25:31 m1c5 kernel:  intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul gpio_ich ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf shpchp lpc_i
May 21 11:25:31 m1c5 kernel: CPU: 5 PID: 6976 Comm: fio Tainted: G           OE   4.13.0-36-generic #40~16.04.1-Ubuntu
May 21 11:25:31 m1c5 kernel: Hardware name: HP ProLiant m710p Server Cartridge/, BIOS H06 04/06/2016
May 21 11:25:31 m1c5 kernel: task: ffff922f6e870000 task.stack: ffffa62f628d4000
May 21 11:25:31 m1c5 kernel: RIP: 0010:submit_bh_wbc+0x15f/0x180
May 21 11:25:31 m1c5 kernel: RSP: 0018:ffffa62f628d7a28 EFLAGS: 00010246
May 21 11:25:31 m1c5 kernel: RAX: 0000000000660005 RBX: ffff92314712d340 RCX: 0000000000000000
May 21 11:25:31 m1c5 kernel: RDX: ffff92314712d340 RSI: 0000000000020800 RDI: 0000000000000001
May 21 11:25:31 m1c5 kernel: RBP: ffffa62f628d7a58 R08: 0000000000000000 R09: 0000000000000000
May 21 11:25:31 m1c5 kernel: R10: 000000000000028b R11: 000000000000025f R12: 0000000000020800
May 21 11:25:31 m1c5 kernel: R13: ffff922f6e940400 R14: 000000000438dac8 R15: ffff923179ef7000
May 21 11:25:31 m1c5 kernel: FS:  00007f3a25389a80(0000) GS:ffff9231ff540000(0000) knlGS:0000000000000000
May 21 11:25:31 m1c5 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 21 11:25:31 m1c5 kernel: CR2: 000000c8202d0000 CR3: 000000065c22c004 CR4: 00000000003606e0
May 21 11:25:31 m1c5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 21 11:25:31 m1c5 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 21 11:25:31 m1c5 kernel: Call Trace:
May 21 11:25:31 m1c5 kernel:  __sync_dirty_buffer+0x72/0xf0
May 21 11:25:31 m1c5 kernel:  ext4_commit_super+0x204/0x2a0
May 21 11:25:31 m1c5 kernel:  __ext4_error_inode+0xb8/0x190
May 21 11:25:31 m1c5 kernel:  __ext4_get_inode_loc+0x1dd/0x3d0
May 21 11:25:31 m1c5 kernel:  ext4_reserve_inode_write+0x52/0xc0
May 21 11:25:31 m1c5 kernel:  ? ext4_dirty_inode+0x48/0x70
May 21 11:25:31 m1c5 kernel:  ext4_mark_inode_dirty+0x53/0x1d0
May 21 11:25:31 m1c5 kernel:  ? __ext4_journal_start_sb+0x6d/0x120
May 21 11:25:31 m1c5 kernel:  ext4_dirty_inode+0x48/0x70
May 21 11:25:31 m1c5 kernel:  __mark_inode_dirty+0x181/0x3b0
May 21 11:25:31 m1c5 kernel:  generic_update_time+0x7b/0xd0
May 21 11:25:31 m1c5 kernel:  ? current_time+0x38/0x70
May 21 11:25:31 m1c5 kernel:  file_update_time+0xbe/0x110
May 21 11:25:31 m1c5 kernel:  __generic_file_write_iter+0x9d/0x1f0
May 21 11:25:31 m1c5 kernel:  ext4_file_write_iter+0x28e/0x3f0
May 21 11:25:31 m1c5 kernel:  new_sync_write+0xe5/0x140
May 21 11:25:31 m1c5 kernel:  __vfs_write+0x29/0x40
May 21 11:25:31 m1c5 kernel:  vfs_write+0xb8/0x1b0
May 21 11:25:31 m1c5 kernel:  SyS_write+0x55/0xc0
May 21 11:25:31 m1c5 kernel:  entry_SYSCALL_64_fastpath+0x24/0xab
May 21 11:25:31 m1c5 kernel: RIP: 0033:0x7f3a111b94bd
May 21 11:25:31 m1c5 kernel: RSP: 002b:00007ffef044a3e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
May 21 11:25:31 m1c5 kernel: RAX: ffffffffffffffda RBX: 000000057cbdf000 RCX: 00007f3a111b94bd
May 21 11:25:31 m1c5 kernel: RDX: 0000000000001000 RSI: 00000000012bdc00 RDI: 0000000000000003
May 21 11:25:31 m1c5 kernel: RBP: 00007f39ff8e6000 R08: 0000000000000008 R09: 9e37fffffffc0001
May 21 11:25:31 m1c5 kernel: R10: 00000000447996f2 R11: 0000000000000293 R12: 00000000012bdc00
May 21 11:25:31 m1c5 kernel: R13: 0000000e00000000 R14: 0000000000001000 R15: 00007f3a0d52c4f0
May 21 11:25:31 m1c5 kernel: Code: e8 80 cc 20 80 e6 80 44 0f 45 e8 45 09 f5 45 89 6c 24 14 e8 b4 be 19 00 48 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b <0f> 0b f0 80 63 01 f7 e9 02 ff ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 
May 21 11:25:31 m1c5 kernel: RIP: submit_bh_wbc+0x15f/0x180 RSP: ffffa62f628d7a28
May 21 11:25:31 m1c5 kernel: ---[ end trace 61f8bdd2bb79d08b ]---
May 21 11:25:31 m1c5 kernel: Kernel panic - not syncing: Fatal exception
@kvaps
Copy link
Author

kvaps commented May 21, 2018

This is sometimes working but sometimes not.
Is it sheepdog or nbd module bug?

This is my kernel:

# uname -a
Linux m1c5 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

sheepdog version 1.0.1
qemu-nbd version 0.0.1

@kvaps kvaps changed the title Kernel panic when using qemu-nbd Randomly kernel panic when using qemu-nbd May 21, 2018
@kvaps kvaps closed this as completed Jun 4, 2018
@Jeansen
Copy link

Jeansen commented Aug 28, 2020

I am having similar issues in another context with qemu-nbd. Did you find the root cause for this? Any solutions or workarounds?

@kvaps
Copy link
Author

kvaps commented Aug 28, 2020

Wow, it was so far ago, I have no idea why I closed the issue :-/

@kvaps kvaps reopened this Aug 28, 2020
@Jeansen
Copy link

Jeansen commented Aug 30, 2020

So, this problem still exists for you, too?

@kvaps
Copy link
Author

kvaps commented Aug 30, 2020

Not exactly, I'm not using sheepdog anymore :)

@Jeansen
Copy link

Jeansen commented Aug 31, 2020

I see. Well, I do not use sheepdog, either but have the same problem with nbd then and when and could not track it down, yet.

@kvaps
Copy link
Author

kvaps commented Aug 31, 2020

From my point of view NBD is very raw technology, I would suggest to never use it remotely.
Regarding your question try disabling NMI Watchdog in kernel, if I remember this correct it was helpful in some situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants