Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give zfs/zpool commands a callback mechanism to tell you they're not coming back #16192

Open
rincebrain opened this issue May 11, 2024 · 1 comment
Labels
Type: Feature Feature request or new feature

Comments

@rincebrain
Copy link
Contributor

Describe the feature would like to see added to OpenZFS

Sometimes, bugs happen, and we trip an ASSERT/VERIFY when running a zfs command, leading to the command likely never returning, since we're now in an infinite wait in the kernel.

It would be useful to have, say, a /dev/zfserror. and dedicated thread while we're running commands whose sole role is to get broadcasts from the kernel module if we trip an assert/verify and never come back, so that we can print out "oops you might be doomed" to any commands running when it happens.

(I'm not married to that mechanism, using a socket and having zed broadcast on it if it gets a certain event, or something, would be fine - the main goal here is to give users who don't know anything about the internals more awareness of the difference between "this import is taking a while" and "this import panicked and you should probably not wait on it", hopefully improving the rate of people communicating problems they encounter versus "this broke, I don't know why" and requiring asking someone knowledgable to know they need to check there.)

How will this feature improve OpenZFS?

Not requiring a round trip to a developer for people unfamiliar with this failure mode to distinguish between "this import is taking minutes/hours" and "this import has encountered a stuck kernel thread, and will never come back".

Additional context

@rincebrain rincebrain added the Type: Feature Feature request or new feature label May 11, 2024
@rrevans
Copy link
Contributor

rrevans commented May 13, 2024

Ideas:

  1. SPL could have dbgerr like dbgmsg except always readable. That would report failures similar to how dbgmsg reports debug messages. Commands watch this file and output any messages emitted from the kernel to stderr so users can see a panic has happened somewhere in SPL/ZFS.
  2. Maintain a panic bit state for the module and export that through /proc. Commands would watch it and warn the user that the kernel has emitted an assertion error.
  3. Attribute assertion failures to the pool, zfs, or zvol and arrange for the corresponding blocking call to fail with EIO or such when assertions happen instead of reporting through a side-channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

2 participants