New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby orphaned client when server returns error response on disconnect #433
Comments
I suspect this is happening because the client only shuts down the dispatcher (the object that deals with messages being sent) if it gets a successful faye/lib/faye/protocol/client.rb Line 157 in a99192e
This means any other messages that were in flight when the client disconnected could be retried, and this might include polling requests. However, the client should not reconnect after calling faye/lib/faye/protocol/client.rb Line 146 in a99192e
faye/lib/faye/protocol/client.rb Lines 324 to 325 in a99192e
The bigger question is why you're doing this yourself. The Faye client is supposed to deal with going offline transparently, and managing this process yourself is just likely to make it confused. Is there a reason the client's own reconnection handling is not working for you, and you had to implement this yourself? |
Thanks for the reply, We deploy Faye Client on small Raspberry Pi computers that are placed in remote locations. The internet connection is not always that great. What we noticed is that for some reason the internet connection is lost, the DHCP lease gets expired and then the Raspberry Pi comes back online. The Pi gets a new IP address and is again connected with the internet (we can reverse SSH to the pi and see it up and running). Faye on the other hand doesn't seem to pick up this change and continues to stay on the "old connection", it loses it connection to the server and does not recover. The only real solution we have in place now is a restart of our software ( Reading your explanation I still feel there is a bug, when the client cannot disconnect from the server and the server doesn't know the client anymore then the client continues the live were one expects it to die. Regards |
It just occurred to me it might help if I explain the client architecture a little; it might help you find out where the problem is. The client is essentially made of three objects:
In light of that let's examine the disconnect method: def disconnect
return unless @state == CONNECTED
@state = DISCONNECTED
info('Disconnecting ?', @dispatcher.client_id)
promise = Publication.new
send_message({
'channel' => Channel::DISCONNECT,
'clientId' => @dispatcher.client_id
}, {}) do |response|
if response['successful']
@dispatcher.close
promise.set_deferred_status(:succeeded)
else
promise.set_deferred_status(:failed, Error.parse(response['error']))
end
end
info('Clearing channel listeners for ?', @dispatcher.client_id)
@channels = Channel::Set.new
promise
end
When your client goes offline, there's a bunch of messages in the dispatcher in flight and waiting to be retried. You will see Eventually you come back online. The dispatcher's messages will eventually be retried (in unpredictable order) and this time they'll reach the server. If the server recognises the client (i.e. its session has not expired), then both will get normal successful responses and the If the server does not recognise the client, its advize method will attach The client handles that advice only if It seems to me that messages that are in the dispatcher might be retried once you come back online, but if they're unsuccessful (i.e. the session timed out and the server doesn't recognise the Does the above make sense? I realise it might not explain why you're seeing a bug but I thought it might help to explain what I think should be going on in this situation. |
I should task: what timeout setting are you using on the server? The checks you're doing with ping/pong are essentially the same as what As it is, your application and the Faye client are both trying to do the same thing but without any co-ordination, which might explain the weird effects you're seeing. |
Some further thoughts: the unsuccessful However, once a client's session expires (this happens if the server does not hear from the client for |
Hello, Thanks for your explanation this clears up some bits. Server Timings @faye = Faye::RackAdapter.new({
mount: "/#{Printer.config.faye[:path]}",
timeout: 20,
ping: 5,
}) Client Faye::Client.new(Printer.config.faye[:url], interval: 2, retry: 2, timeout: 10) I'm going to step a little bit back and talk a bit more about why we started with our implementation of Ping/Pong and the manual disconnect (maybe this helps explaining why we got this far down the rabbit hole) SetupSo we have one faye server instance running on our cloud server. The server will receive jobs and forward them to the correct channel. Every Raspberry Pi in the field has a Faye client that will connect to multiple channels (depending on its environment). When it receives a job from the Faye Server it will execute the job and log a response locally. The Raspberry Pi's are deployed fro 2-3 months in the field and are always on. Initial problemsFor some complete unknown reason we notice that the Faye Client doesn't receive any jobs from the Faye Server. We can trigger manual jobs, they are received on the Faye Server, transmitted to the Client but the Client never receives them. In our client we don't see any indication that it lost it's connection to the Sever. Our initial idea was that the client has a flaky internet connection so it's normal that they sometimes don't receive a job. But investigating the network we found that the network connection is not always an issue. For example we have run a continues ping on the Raspberry Pi and during a Our measuresWe noticed that everything was back oke when we restarted our ruby software. We then implemented the following stuff:
ConclusionI'll enable full faye logging on a client, maybe this will help pin down our initial problem |
Can you send me your email to send the logs too? They are quite big and contain some sensitive data You can email me at jan[@]playpass[.]be (remove the brackets ofcourse) Regards, |
I don't have an awful lot of time to work on issues at the moment and I suspect that staring at huge log files without access to your system won't get me very far. The ping/pong mechanism you describe under 'Our measures' is exactly what the client already does using The best thing for you to do is find out what goes on in the transport layer when the Pi goes offline -- does What's happening in the |
Are are messages replayed in unpredictable order? Is that a behavior of the protocol? |
Hello,
I have the following setup, the faye server runs in the cloud and a raspberry pi runs the
Faye::Client
. For some odd reason the client loses its server connection (for example unplug ethernet cable in another subnet => pi new address but code is still using old connect) anyways shit hits the fan, but thats not my issue.What I do now is detect this (using ping/pong) and then disconnect the client
client.disconnect
and create a new client (on the new API) and subscribe to the desired channels.Now this is the log output from when I'm trying to create a new client and disconnect the old one.
Notice the last line, the disconnect attempt from the Client fails. Continuing with my horror story, for some reason the server restarts, next I see this:
And suddenly I have 2 clients, a newly created client that I actively used and an orphaned client (which is not even referenced in a ruby variable I might add). When I send a message from the server -> client BOTH clients received it.
What I don't understand is that the client stays alive even after the Server said
Hey I don't know you at all
, for me this should mean that the client should die and close.I know that this flow might seems very exceptional but it happens a lot, to sum up:
Faye::Client#disconnect
@faye = Faye::Client.new
)@faye
) Client 1 is a ghost client but receives the messages and executes the callbacksI'm also wondering how long it takes till a server deletes a Client that it has not seen in the past x minutes? Is there a clean up and how do clients handel this?
Regards,
The text was updated successfully, but these errors were encountered: