Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool clear heroically dies restoring the pool, but doesn't save your previously queued IO #16175

Open
rincebrain opened this issue May 7, 2024 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rincebrain
Copy link
Contributor

System information

Type Version/Name
Distribution Name Debian
Distribution Version 11
Kernel Version 5.10.0-28-amd64
Architecture x86_64
OpenZFS Version 04bae5e

Describe the problem you're observing

If I have a USB hard drive that I created a pool on, using the name /dev/disk/by-id/mydrive-part1, pointing to /dev/sda1, unplug it, and replug it, ZFS will say it's SUSPENDED because the disk is gone.

Great. The device got the name sdb since sda was still being held open when the disk reappeared, I zpool clear that device (by guid, since the device's name won't work, which is a separate complaint), and it marks the pool and device ONLINE after one or two iterations of zpool clear pool or zpool clear pool guid, hurray.

...except the clear command never returns, IO from before the hotplug is hung forever, the old device name is still present (e.g. something is holding references to it still), and the stacktrace is in zio_resume.

New writes to the pool appear to succeed, but old IO is stuck forever. zfs_ioc_clear is blocking on zio_resume is blocking on zio_wait , and txg_wait and cp say they're blocking on:

[  605.497469] INFO: task txg_sync:3137 blocked for more than 241 seconds.
[  605.498342]       Tainted: P           OE     5.10.0-28-amd64 #1 Debian 5.10.209-2
[  605.499169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.499995] task:txg_sync        state:D stack:    0 pid: 3137 ppid:     2 flags:0x00004000
[  605.500846] Call Trace:
[  605.501692]  __schedule+0x282/0x870
[  605.502530]  schedule+0x46/0xb0
[  605.503371]  schedule_timeout+0x8b/0x150
[  605.504213]  ? __next_timer_interrupt+0x110/0x110
[  605.505049]  io_schedule_timeout+0x4c/0x80
[  605.505905]  __cv_timedwait_common+0x12f/0x170 [spl]
[  605.506761]  ? add_wait_queue_exclusive+0x70/0x70
[  605.507623]  __cv_timedwait_io+0x15/0x20 [spl]
[  605.508542]  zio_wait+0x13c/0x2d0 [zfs]
[  605.509455]  dsl_pool_sync+0xf5/0x530 [zfs]
[  605.510354]  spa_sync+0x55a/0x1000 [zfs]
[  605.511175]  ? mutex_lock+0xe/0x30
[  605.512041]  ? spa_txg_history_init_io+0x105/0x110 [zfs]
[  605.512917]  txg_sync_thread+0x212/0x3b0 [zfs]
[  605.513789]  ? txg_quiesce_thread+0x330/0x330 [zfs]
[  605.514607]  thread_generic_wrapper+0x6c/0x80 [spl]
[  605.515409]  ? spl_taskq_fini+0x80/0x80 [spl]
[  605.516199]  kthread+0x118/0x140
[  605.516954]  ? __kthread_bind_mask+0x60/0x60
[  605.517718]  ret_from_fork+0x1f/0x30
[  605.518492] INFO: task cp:3254 blocked for more than 241 seconds.
[  605.519266]       Tainted: P           OE     5.10.0-28-amd64 #1 Debian 5.10.209-2
[  605.520061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.520870] task:cp              state:D stack:    0 pid: 3254 ppid:  3253 flags:0x00004004
[  605.521702] Call Trace:
[  605.522523]  __schedule+0x282/0x870
[  605.523339]  ? __schedule+0x28a/0x870
[  605.524151]  schedule+0x46/0xb0
[  605.524958]  io_schedule+0x42/0x70
[  605.525766]  cv_wait_common+0xac/0x130 [spl]
[  605.526561]  ? add_wait_queue_exclusive+0x70/0x70
[  605.527424]  txg_wait_synced_impl+0xcd/0x120 [zfs]
[  605.528289]  txg_wait_synced+0xc/0x40 [zfs]
[  605.529159]  dmu_tx_wait+0x209/0x400 [zfs]
[  605.530022]  dmu_tx_assign+0x16b/0x4e0 [zfs]
[  605.530880]  zfs_write+0x4be/0xd10 [zfs]
[  605.531687]  ? xas_load+0x5/0x80
[  605.532475]  ? current_time+0x2b/0xf0
[  605.533251]  ? touch_atime+0x33/0x140
[  605.534069]  zpl_iter_write+0x103/0x170 [zfs]
[  605.534839]  new_sync_write+0x119/0x1b0
[  605.535590]  vfs_write+0x1ce/0x260
[  605.536337]  ksys_write+0x5f/0xe0
[  605.537065]  do_syscall_64+0x30/0x80
[  605.537783]  entry_SYSCALL_64_after_hwframe+0x62/0xc7

So I assume it's some dance like nothing told the old zios that their reference to the disk they're trying to write to is stale. (zpool reopen doesn't save you.)

Describe how to reproduce the problem

Above.

Include any warning/errors/backtraces from the system logs

Above.

@rincebrain rincebrain added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 7, 2024
@GregorKopka
Copy link
Contributor

As a dirty hack: what about making zpool clear generally re-queue all in-flight ZIO for the pool?

@rincebrain
Copy link
Contributor Author

rincebrain commented May 30, 2024

It turns out to be very messy and buggy to do that, when I tried it. (Also, I think some IOs aren't necessarily safe to requeue - you'd basically need to reissue the entire txg's worth of IOs to the device, I think? e.g. if you had more than one queue's worth of IOs to issue, and the device went away while they were in the write cache, whether they actually landed depends on how "away" the device went...and you might no longer have them in queue to issue since nominally it claimed they went out.)

It's also somewhat difficult to test, since Linux's behavior on hot-removing a device in use is not...always consistent, and you're not always at the same point in the IO pipeline when you do it.

@GregorKopka
Copy link
Contributor

GregorKopka commented Jun 1, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants