Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreachable drive causes multiple failures/hangs #6076

Open
1 task done
austinwbest opened this issue Oct 9, 2023 · 11 comments
Open
1 task done

Unreachable drive causes multiple failures/hangs #6076

austinwbest opened this issue Oct 9, 2023 · 11 comments

Comments

@austinwbest
Copy link

Is there an existing issue for this?

  • I have searched the existing open and closed issues

Current Behavior

So when a drive is not able to be connected to, things just simply start breaking. I have multiple mounts and one of them is an sshfs mount... For whatever reason windows is not able to connect to the sshfs mount and if that happens the app does not handle it well at all.

Updates stop working
Check health stops working
Disk Space in the UI hangs forever
Message cleanup stops working

These i have noticed but im not sure what else is tied to those threads that could also stop working. This happens in all of them and i have grabbed some screenshots from radarr and sonarr. It seems 8 days ago the sshfs mount stopped being reachable for whatever windows reason.

If i restart sonarr and some of the tasks make it in before health they will run & if fast enough will complete but after health check gets stuck that is it for some things

Radarr
image
image
image

Sonarr
image

Expected Behavior

Timeout the drive connection when it cant pull the data instead of hanging indefinitely. Maybe show it as a red icon in the disk space, make it a health error, etc but allow the app to keep functioning.

Steps To Reproduce

I guess create an sshfs mount and wait for it to stop working (this is windows so i cant say what causes it to stop, it just randomly does).. It usually throws an I/O thread operation error when trying to open it manually and requires a reboot to fix. However if one doesn't look at their apps all the time it is hard to catch it

Environment

- OS: Windows 11
- Sonarr: 3.0.10.1567
- Docker Install: No
- Using Reverse Proxy: No
- Browser: Chrome
- Database: SQLite

What branch are you running?

Main

Trace Logs?

I can do a restart and get them but it doesn't show anything about the drives that i have found. It sends the request for

Http|Req: 9 [GET] /api/v3/diskspace (from ::1 Mozilla....

However there is never a Http|Res coming back from it as it hangs indefinitely in the UI

Anything else?

I get this is some stupid windows sshfs connection issue but after talking to Q he thought it should be handled a little more gracefully as well but didn't have a solution so i am adding this here for thoughts and discussion as needed.

@austinwbest
Copy link
Author

As a side note... I wont reboot the machine as that will fix it (until windows freaks out again), incase something else is needed or wants to be tested while it is in this broken sshfs state

@markus101
Copy link
Member

Probably the most valuable piece would be around which health check is hanging, which should trace logs should have.

@austinwbest
Copy link
Author

OK, I can kill it, clear the logs, start it. How long do you want to allow it to run before grabbing trace?

@markus101
Copy link
Member

A couple minutes should be good.

@austinwbest
Copy link
Author

Here is the first 5 mins after that last startup. This was an end task from process manager and restart.
sonarr.trace.txt

And it is currently still hung from the restart a couple hours ago
image

image

markus101 added a commit that referenced this issue Oct 10, 2023
@markus101
Copy link
Member

Unfortunately it doesn't look like we log anything related to which health check is being executed, but I've added some additional logging in v4.

@austinwbest
Copy link
Author

ok, i'll wait for it to make it into Radarr and keep an eye on it to mess up there so i can report back what is failing so you know as well. Im not on v4 atm but since the health checks are nearly identical if we know why it happens in one that should help with the others as well. I rebooted the machine and everything is working again (for now) so hopefully the update gets in before windows does whatever it is doing and breaks.

@austinwbest
Copy link
Author

austinwbest commented Dec 5, 2023

Finally is happening again. This is the search all in the trace after about 3 mins from a restart & it just hangs. It never recovers from the mount check and now i have like 600 things sitting in queue because it wont process anything for the last few week or so.

Maybe put processing the queue on a thread where something cant completely block it like this (as an after thought)

Keep in mind this is from radarr because i dont run sonarr v4 but it is also stuck for the same amount of time on application update & health check.

So it seems health, app updates, process downloads are all blocking

Edit: i just checked manually and it is indeed the sshfs mount that windows wont connect to for whatever reason again
Took about 10 mins for it to finally throw but this is what windows does
image

Here is a health service search
image

@austinwbest
Copy link
Author

Had to force stop/restart a few times so the tasks that could run before health locked it up could finish. Now that the only one left is health, here is a trace log for the first 3 minutes or so.

Same as shown above, MountCheck just never comes back and will hang indefinitely

radarr.trace.txt

@markus101
Copy link
Member

So it seems health, app updates, process downloads are all blocking

Health is not blocking.
App updating is blocking if it's the next task to run and when health is failing to complete it'll eventually be the next to run and block everything.
Processing downloads itself is not blocking, the actual import portion of it is, but only to other tasks that require disk access.

A bandage here would be to split checking for updates from installing updates, but it's a bandage because there would still be a hung thread and Sonarr would be operating at -1 threads.

Preventing the thread from hanging when checking health would allow the task queue to execute, at least to some degree, but will likely get hung up trying to import, unless the issue only lies with getting the mounts that the health check is doing.

Not sure what the best solution here is at the moment, but thanks for the information.

@austinwbest
Copy link
Author

@mynameisbogdan did a test exe for me to try and get a little deeper, he might be able to share as well just for more information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants