Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RosbridgeProtocol instance clean-up hangs when client disconnects under specific conditions. #891

Open
ramlab-jose opened this issue Dec 5, 2023 · 0 comments
Labels

Comments

@ramlab-jose
Copy link

Description
Under specific conditions that I have not yet been able to pin down a client disconnection will not be gracefully handled, leading to the server attempting to forward messages to that (closed) client's websocket and thus spamming errors. This also leads to a leakage in resources and eventual lock up of the rosbridge process.

The problem seems to come from this last part of the Protocol.incoming function. After adding a bunch of logging it seems that this blocks the IncomingQueue.run loop and thus the protocol.finish of a given client is never triggered.

As mentioned I have yet to find a minimal way to reproduce the problem, but we have frequently encountered this when there are rapid connections/disconnections happening and the rosbridge instance is under load.

I was able to "fix" the problem by improving the behavior regarding the remaining message that is kept in self.buffer here but would like some input on why this is here and how we can fix it properly.

Thanks in advance, and I believe this could explain some of the other issues that went stale in the past.

  • Library Version: latest ros2 branch
  • ROS Version: Humble
  • Platform / OS: Ubuntu 22.04 (docker)

Steps To Reproduce
I have yet to find a reliable way of reproducing the problem, but from my experience the following conditions seem to trigger the problem:

  • 10 clients that rapidly connect, subscribe to a few topics and call a "long" running service, followed by a disconnection. From the debugging I have done it doesn't seem that the service call should be needed but perhaps it helps use-up some resources that help trigger the buffering of the websocket.

Expected Behavior
A client disconnecting should always result in the respective RosbridgeProtocol instance (and respective Capabilities) being cleaned up.

Actual Behavior
Sometimes the clean up (.finish) seems to hang and resources remain being used against a closed websocket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant