Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drbd with rdma gets stuck when shutting down a resource #58

Open
wzrdtales opened this issue Jun 12, 2023 · 6 comments
Open

drbd with rdma gets stuck when shutting down a resource #58

wzrdtales opened this issue Jun 12, 2023 · 6 comments

Comments

@wzrdtales
Copy link

wzrdtales commented Jun 12, 2023

update:

wait_event(rdma_transport->cm_count_wait, !atomic_read(&rdma_transport->cm_count));

this is the line it gets stuck.

drbd with rdma gets stuck when disconnecting a resource in sync

here are the logs we could retrieve:

n 12 15:24:16 os-hv-am-4 kernel: [  232.569733] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.571421] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.571439] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Split-Brain detected but unresolved, dropping connection!
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.572513] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm split-brain
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574101] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm split-brain exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574129] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Aborting cluster-wide state change 3531582257 (20ms) rv = -27
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574141] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Disconnecting )
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574164] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: sock_recvmsg returned -4
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574242] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574260] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3519])
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576227] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 1307, expected 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576229] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576239] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577812] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577830] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Disconnecting -> StandAlone )
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577834] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating receiver thread
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140302] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( StandAlone -> Unconnected )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140350] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting receiver thread (from drbd_w_vm-57da0 [3500])
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140513] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Unconnected -> Connecting )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148270] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Handshake to peer 1 successful: Agreed network protocol version 121
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148273] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Feature flags enabled on protocol level: 0x3f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148316] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Peer authenticated using 20 bytes HMAC
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.157481] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Preparing cluster-wide state change 105203894 (0->1 499/145)
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193492] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: drbd_sync_handshake:
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193497] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: self 78ECCE9CE229714C:20F264F5A750B1D5:23B3AB54E1225FE6:940F040BBBAE35D4 bits:1264914 flags:22
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193500] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: peer 5B8C3B4DEBC7DD0A:20F264F5A750B1D4:9AD186D68FBE0520:0000000000000000 bits:1264658 flags:1025
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193502] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: uuid_compare()=split-brain-auto-recover by rule=bitmap-both
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193512] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195114] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain exit code 0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195125] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Split-Brain detected, manually solved. Sync from this node
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195139] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: State change 105203894: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195142] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Committing cluster-wide state change 105203894 (36ms)
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195155] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195157] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: pdsk( Outdated -> Inconsistent ) repl( Off -> WFBitMapS )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.200092] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 494(1), total 494; compression: 100.0%
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.205201] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 494(1), total 494; compression: 100.0%
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.205214] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm before-resync-source
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206764] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm before-resync-source exit code 0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206775] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: repl( WFBitMapS -> SyncSource )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206824] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Began resync as SyncSource (will sync 5059656 KB [1264914 bits set]).
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207282] ------------[ cut here ]------------
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207293] WARNING: CPU: 114 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x3a/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207294] Modules linked in: drbd_transport_rdma(OE) nvmet_rdma nvmet drbd(OE) lru_cache nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter aufs overlay nls_iso8859_1 ipmi_ssif amd64_edac_mod rpcrdma edac_mce_amd sunrpc rdma_ucm kvm_amd ib_iser kvm rdma_cm iw_cm crct10dif_pclmul libiscsi ib_umad ghash_clmulni_intel scsi_transport_iscsi ib_ipoib aesni_intel ib_cm crypto_simd cryptd glue_helper dell_smbios dcdbas mgag200 drm_vram_helper dell_wmi_descriptor wmi_bmof ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid bridge stp llc bonding sch_fq_codel ramoops reed_solomon drm efi_pstore ip_tables x_tables autofs4 raid10 raid456
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207334]  async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx4_ib mlx4_en mlx5_ib ib_uverbs raid1 ib_core crc32_pclmul mlx5_core mlx4_core tg3 ahci nvme pci_hyperv_intf libahci tls nvme_core mlxfw i2c_piix4 wmi
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207348] CPU: 114 PID: 0 Comm: swapper/114 Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207349] Hardware name: Dell Inc. PowerEdge R6515/0R4CNN, BIOS 2.7.3 03/31/2022
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207353] RIP: 0010:__local_bh_enable_ip+0x3a/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207355] Code: a9 00 00 0f 00 75 23 83 ee 01 f7 de 65 01 35 05 af 37 7e 65 8b 05 fe ae 37 7e a9 00 ff 1f 00 74 0d 5d 65 ff 0d ef ae 37 7e c3 <0f> 0b eb d9 65 66 8b 05 da e9 38 7e 66 85 c0 74 e6 e8 60 ff ff ff
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207357] RSP: 0018:ffffab6e41e18c48 EFLAGS: 00010006
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207358] RAX: 0000000080010200 RBX: ffff8a5c2380c800 RCX: 0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207359] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffffffffc0d0ab62
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207360] RBP: ffffab6e41e18c48 R08: ffff8a5c18528b68 R09: ffff8a5c18528b90
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207361] R10: 0000000000000000 R11: ffff8a5bfd7e0060 R12: 0000000000000001
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207362] R13: ffff8a5c2380c858 R14: 0000000000000000 R15: ffff8a5b7ebee010
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207364] FS:  0000000000000000(0000) GS:ffff8a5c3f080000(0000) knlGS:0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207366] CR2: 000055e261248048 CR3: 000000bd5ca66000 CR4: 0000000000340ee0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207367] Call Trace:
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207368]  <IRQ>
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207374]  _raw_spin_unlock_bh+0x1e/0x20
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207389]  update_peer_seq+0x62/0x80 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207401]  got_BlockAck+0x63/0x2c0 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207412]  drbd_control_data_ready+0x8b/0x1e0 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207417]  dtr_control_data_ready.isra.0+0x9b/0x100 [drbd_transport_rdma]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207420]  dtr_rx_cq_event_handler+0x2a9/0x640 [drbd_transport_rdma]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207432]  mlx5_ib_cq_comp+0x26/0x30 [mlx5_ib]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207457]  mlx5_eq_comp_int+0xa1/0x180 [mlx5_core]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207461]  notifier_call_chain+0x55/0x80
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207464]  atomic_notifier_call_chain+0x1a/0x20
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207494]  mlx5_irq_int_handler+0x15/0x20 [mlx5_core]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207496]  __handle_irq_event_percpu+0x42/0x180
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207499]  handle_irq_event_percpu+0x33/0x80
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207501]  handle_irq_event+0x3b/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207503]  handle_edge_irq+0x93/0x1c0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207505]  do_IRQ+0x55/0xf0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207507]  common_interrupt+0xf/0xf
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207508]  </IRQ>
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207511] RIP: 0010:cpuidle_enter_state+0xc5/0x450
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207513] Code: ff e8 1f f7 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 92 0c 8a ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207514] RSP: 0018:ffffab6e4068fe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207516] RAX: ffff8a5c3f0afe80 RBX: ffffffff83769620 RCX: 000000000000001f
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207516] RDX: 0000000000000000 RSI: 0000000038feef44 RDI: 0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207517] RBP: ffffab6e4068fe78 R08: 0000003c958fd955 R09: 0000003cc5b70acd
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207518] R10: ffff8a5c3f0aeb80 R11: ffff8a5c3f0aeb60 R12: ffff8a5bfea05800
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207519] R13: 0000000000000002 R14: 0000000000000002 R15: ffff8a5bfea05800
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207523]  ? cpuidle_enter_state+0xa1/0x450
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207525]  cpuidle_enter+0x2e/0x40
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207527]  call_cpuidle+0x23/0x40
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207529]  do_idle+0x1dd/0x270
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207532]  cpu_startup_entry+0x20/0x30
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207534]  start_secondary+0x167/0x1c0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207537]  secondary_startup_64+0xa4/0xb0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207539] ---[ end trace 4c646c84e9801deb ]---
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872034] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Preparing remote state change 1094725292
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872097] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Committing remote state change 1094725292 (primary_nodes=1)
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872103] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connected -> TearDown ) peer( Secondary -> Unknown )
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872105] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: repl( SyncSource -> Off )
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872185] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Sending resync reply failed
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872415] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872992] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3539])
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875133] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 rdma:os-hv-am-3: WARN: not properly disconnected, state = 6
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875779] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 2609, expected 0
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875781] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875793] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877358] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877372] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( TearDown -> Unconnected )
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877382] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Restarting receiver thread
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877386] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Unconnected -> Connecting )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.185970] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Disconnecting )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186035] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Failed to initiate connection, err=-512
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186062] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186066] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3539])
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186473] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 2610, expected 0
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186475] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186484] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188149] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188176] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Disconnecting -> StandAlone )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188180] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating receiver thread
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882721] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Preparing cluster-wide state change 2022799093 (0->-1 3/2)
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882724] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Committing cluster-wide state change 2022799093 (0ms)
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882728] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: role( Primary -> Secondary )
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.883097] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615298] INFO: task drbdsetup:3646 blocked for more than 120 seconds.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615345]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615424] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615427] Call Trace:
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615438]  __schedule+0x2e3/0x740
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615441]  schedule+0x42/0xb0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615450]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615455]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615473]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615487]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615501]  del_connection+0x95/0x150 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615503]  ? _cond_resched+0x19/0x30
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615515]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615518]  ? remove_wait_queue+0x47/0x50
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615523]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615526]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615528]  ? _cond_resched+0x19/0x30
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615529]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615531]  netlink_rcv_skb+0x50/0x120
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615533]  genl_rcv+0x29/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615535]  netlink_unicast+0x1a8/0x250
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615537]  netlink_sendmsg+0x240/0x480
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615540]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615545]  sock_sendmsg+0x65/0x70
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615547]  sock_write_iter+0x93/0xf0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615551]  new_sync_write+0x125/0x1c0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615554]  __vfs_write+0x29/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615556]  vfs_write+0xb9/0x1a0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615558]  ksys_write+0x67/0xe0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615560]  __x64_sys_write+0x1a/0x20
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615564]  do_syscall_64+0x57/0x190
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615566]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615569] RIP: 0033:0x7f08fa258077
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615575] Code: Bad RIP value.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615576] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615579] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615580] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615581] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615582] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615583] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:28:43 os-hv-am-4 kernel: [  499.911721] sysrq: Emergency Sync
Jun 12 15:28:43 os-hv-am-4 kernel: [  499.913650] Emergency Sync complete
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443007] INFO: task drbdsetup:3646 blocked for more than 241 seconds.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443056]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443134] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443137] Call Trace:
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443147]  __schedule+0x2e3/0x740
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443150]  schedule+0x42/0xb0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443159]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443164]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443182]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443196]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443209]  del_connection+0x95/0x150 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443212]  ? _cond_resched+0x19/0x30
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443224]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443227]  ? remove_wait_queue+0x47/0x50
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443232]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443235]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443237]  ? _cond_resched+0x19/0x30
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443238]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443240]  netlink_rcv_skb+0x50/0x120
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443242]  genl_rcv+0x29/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443244]  netlink_unicast+0x1a8/0x250
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443246]  netlink_sendmsg+0x240/0x480
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443249]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443254]  sock_sendmsg+0x65/0x70
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443256]  sock_write_iter+0x93/0xf0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443260]  new_sync_write+0x125/0x1c0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443263]  __vfs_write+0x29/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443265]  vfs_write+0xb9/0x1a0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443267]  ksys_write+0x67/0xe0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443269]  __x64_sys_write+0x1a/0x20
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443272]  do_syscall_64+0x57/0x190
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443275]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443277] RIP: 0033:0x7f08fa258077
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443285] Code: Bad RIP value.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443287] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443289] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443290] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443291] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443292] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443293] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267490] INFO: task drbdsetup:3646 blocked for more than 362 seconds.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267539]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267617] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267621] Call Trace:
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267632]  __schedule+0x2e3/0x740
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267635]  schedule+0x42/0xb0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267644]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267648]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267667]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267680]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267694]  del_connection+0x95/0x150 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267697]  ? _cond_resched+0x19/0x30
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267710]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267712]  ? remove_wait_queue+0x47/0x50
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267717]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267720]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267722]  ? _cond_resched+0x19/0x30
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267724]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267725]  netlink_rcv_skb+0x50/0x120
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267728]  genl_rcv+0x29/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267729]  netlink_unicast+0x1a8/0x250
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267731]  netlink_sendmsg+0x240/0x480
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267735]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267739]  sock_sendmsg+0x65/0x70
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267742]  sock_write_iter+0x93/0xf0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267746]  new_sync_write+0x125/0x1c0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267749]  __vfs_write+0x29/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267750]  vfs_write+0xb9/0x1a0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267752]  ksys_write+0x67/0xe0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267754]  __x64_sys_write+0x1a/0x20
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267757]  do_syscall_64+0x57/0x190
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267760]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267763] RIP: 0033:0x7f08fa258077
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267767] Code: Bad RIP value.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267769] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267771] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267772] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267773] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267774] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267775] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.203227] SGI XFS with ACLs, security attributes, realtime, no debug enabled
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.212617] JFS: nTxBlock = 8192, nTxLock = 65536
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.233034] ntfs: driver 2.1.32 [Flags: R/O MODULE].
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.256163] QNX4 filesystem 0.2.3 registered.
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.294546] Btrfs loaded, crc32c=crc32c-intel
Jun 12 15:32:46 os-hv-am-4 kernel: [  742.073017] device-mapper: table: 253:4: linear: Device lookup failed
Jun 12 15:32:46 os-hv-am-4 kernel: [  742.073492] device-mapper: ioctl: error adding target to table
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093603] INFO: task drbdsetup:3646 blocked for more than 483 seconds.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093654]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093731] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093735] Call Trace:
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093748]  __schedule+0x2e3/0x740
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093751]  schedule+0x42/0xb0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093761]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093767]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093815]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093829]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093843]  del_connection+0x95/0x150 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093845]  ? _cond_resched+0x19/0x30
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093858]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093860]  ? remove_wait_queue+0x47/0x50
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093865]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093868]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093870]  ? _cond_resched+0x19/0x30
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093872]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093874]  netlink_rcv_skb+0x50/0x120
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093876]  genl_rcv+0x29/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093878]  netlink_unicast+0x1a8/0x250
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093880]  netlink_sendmsg+0x240/0x480
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093883]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093888]  sock_sendmsg+0x65/0x70
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093891]  sock_write_iter+0x93/0xf0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093894]  new_sync_write+0x125/0x1c0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093897]  __vfs_write+0x29/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093899]  vfs_write+0xb9/0x1a0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093901]  ksys_write+0x67/0xe0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093903]  __x64_sys_write+0x1a/0x20
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093907]  do_syscall_64+0x57/0x190
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093910]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093913] RIP: 0033:0x7f08fa258077
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093918] Code: Bad RIP value.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093920] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093922] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093923] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093924] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093925] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093926] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000

drbd version: 9.2.2

to reproduce

setup rdma synced disk. setup first node, setup second node. connect them, disconnect them, try to shut any of the two down. They will be stuck forever and only a hard reboot will release this.

@wzrdtales
Copy link
Author

edit just up and down is enough to get stuck. so this is a problem with rdma in general as is it seems

@wzrdtales
Copy link
Author

tested the other releases, this bug exists since the first release 9.2.0 and was never fixed also not in 9.2.4

@wzrdtales
Copy link
Author

I test wise decremented the counter before the event

wait_event(rdma_transport->cm_count_wait, !atomic_read(&rdma_transport->cm_count));

This fixes this issue, but wasn't able yet to debug why this decrement is never called. Indeed it really never does so, we had a server sitting for a week in that state.

@wzrdtales wzrdtales changed the title drbd with rdma gets stuck when disconnecting a resource in sync drbd with rdma gets stuck when shutting down a resource Jun 12, 2023
@wzrdtales
Copy link
Author

maybe @rck can help out from here?

@wzrdtales
Copy link
Author

when a connection is made it will again get stuck in the same position, with higher cm_counts. When a connection is cut unexpectedly (we simulated a crash), we end up with the same situation. Those things are not being freed anymore, it is just stuck.

@wzrdtales
Copy link
Author

further debugging shows, there are deeper problems with the rdma driver. The whole system will eventually lock up after some while blocking any further file reads.

This was referenced Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant