Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect lost connection when writing #582

Open
Mirodin opened this issue Feb 13, 2022 · 8 comments
Open

Detect lost connection when writing #582

Mirodin opened this issue Feb 13, 2022 · 8 comments
Labels

Comments

@Mirodin
Copy link

Mirodin commented Feb 13, 2022

I am pretty sure I am missing some obvious point. How do I know that the connection was lost when writing some data to a TCP stream? I would like to start reconnecting when a write fails. But when using this code, nil gets printed regardless if the server was shutdown in the meantime or write was successful.

local socket = uv.new_tcp()
socket:connect("127.0.0.1", 80)

function send() 
  local current = coroutine.running()
  socket:write("foo", function(err) coroutine.resume(current, err) end)
  print(coroutine.yield())
end

send() -- gets called every 5 seconds
@SinisterRectus
Copy link
Member

SinisterRectus commented Feb 13, 2022

The code to run after connecting should be in the connect callback. See here.

You probably also want to check the return values of the write to make sure it is successful.

Looking at the echo example in the docs, there is no call to connect. I wonder how that happened. There is an example that uses it here.

@Mirodin
Copy link
Author

Mirodin commented Feb 13, 2022

My example above was nst the best to be honest. Connection works just fine and my code yields there as well. Communication with the server is established and working.

The problematic part is the write function which gets called when sending data. Since this is a long standing TCP connection I regularly ping the server. For an automatic reconnect this ping needs to somehow know if the connection has been lost.

Therefore I need to check if the write function succeeded. But the err parameter is nil even if I terminate the server between pings. Therefore my program does not know it lost connection and does no reconnect.

@truemedian
Copy link
Member

If the server socket is connected to closes, all subsequent write callbacks should indicate such with err == "ECONRESET".

I can confirm this behavior on my machine using luv's echo server example and the snippet you provided above.

@Mirodin
Copy link
Author

Mirodin commented Feb 14, 2022

Interesting.

When I try the example and shutdown the luv based server I get your result as well. But for some reason using an external server still writes successfully one more time when the server is already shut down. The second write after that then terminates my program without any errors.

I modified the luv example a little bit so it shows the same behaviour I am experiencing. Maybe that helps settle the dust a little bit. For testing I started the default Python HTTP server python -m http.server and then ran this client code:

Lua: 5.3.5
Luv: 1.43.0-0 from Luarocks

local uv = require("luv")

local client = uv.new_tcp()
uv.tcp_connect(client, "127.0.0.1", 8000, function (err)
  assert(not err, err)

  uv.read_start(client, function (err, chunk)
    print("received at client")
    print("chunk", chunk)
    print("err", err)
    assert(not err, err)
  end)

end)

local t = uv.new_timer()
t:start(3000, 3000, function()
	client:write("hello", function(err)
		print("tick")
		print("err", err)
	end)
end)

uv.run()

This is the terminal output:

> lua ./client.lua 
tick
err	nil
tick
err	nil
# terminating the server with Ctrl-C (no error)
received at client
chunk	nil
err	nil
# no error even though the server is already down
tick
err	nil
# client program terminates at next tick
# return code 141

@truemedian
Copy link
Member

return code 141 (128 + 13) would indicate that the write indeed failed, but it raised a SIGPIPE (13) that was unhandled.

You can mask the signal (only works on non-Windows systems) with

local sig = uv.new_signal()
uv.signal_start(sig, 'sigpipe')
uv.unref(sig)

@Mirodin
Copy link
Author

Mirodin commented Feb 14, 2022

Maybe I do not understand your solution but the idea was that my client(-class) would notice a lost connection when sending a ping or any other message and then automatically reconnect. The program using that client to talk to the server would not know about any of this. Right now the client does not know that sending failed and does not reconnect and resend.

So the process looks like this now:

  1. client sends message 1 (success)
  2. server dies/disconnects/lost/...
  3. client sends message 2 (supposedly successful even though it is not)
  4. client tries to send message 3
  5. program terminates with signal 141

What I need the process to look like:

  1. client sends message1 (success)
  2. server dies/disconnects/lost/...
  3. client tries to send message 2 (fail)
  4. client reconnects
  5. client resends message 2
  6. ...

@truemedian
Copy link
Member

Exactly, what you need to do is tell the operating system that you know about and plan on handling SIGPIPE errors. If you include the snippet I posted above, the SIGPIPE should disappear and write should return EPIPE, which should allow libuv to give you an error.

@Mirodin
Copy link
Author

Mirodin commented Feb 15, 2022

Ah, now I get it. Looks like this works in preventing my program from terminating. But the issue that one message gets lost still persist since the EPIPE only returns from the second failed transfer onwards. Guess I need to extend the read function to handle nil error and chunk and assume lost connection by then.

Thanks a lot, I really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants