New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logcabin abort after running out of file descriptors #198
Comments
Fun. I hadn't really worried about the number of open file descriptors in the past, and I can't rule out a leak somewhere. My first worry was that maybe the SegmentedLog had gotten in some pathological situation and generated a bunch of small segments, but it's only got 20, so that's not it:
The other good news is that your state machine has never tried to take a snapshot, so that can't be it:
It might be a leak. It might be sockets, epoll fds, timerfds, signalfds, or other non-obvious stuff. It's hard to say with the current information. If you can repro this (it sounds like you can), try to list out what's in
|
@nhardt, what's the latest on this one? |
Nothing new. Likely still occurring but as there is minimal impact (vs other pressing issue) I have not tracked it down. |
Scale has a test that will repeatedly block network traffic for 65 seconds at a time. Under these circumstances, logcabind can run out file descriptors and abort. During this time, there are an average number of attempts to read/write. Haven't yet managed to repro the issue with easy steps or in a unit test, and it's possible this is working by design, but just thought I'd post in case it triggers any ideas of which direction might need some exploration.
The text was updated successfully, but these errors were encountered: