Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App doesn't reconnect on lost connection #36

Open
pa-sowa opened this issue Aug 7, 2022 · 8 comments
Open

App doesn't reconnect on lost connection #36

pa-sowa opened this issue Aug 7, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@pa-sowa
Copy link
Contributor

pa-sowa commented Aug 7, 2022

When I suspend and later resume my computer it loses connection to the MQTT server and doesn't attempt to reconnect. I need to restart the app manually.
I see this error:
ERROR [client] Connect comms goroutine - error triggered read tcp 192.168.1.52:46248->192.168.1.45:1883: read: connection reset by peer

@W-Floyd
Copy link
Owner

W-Floyd commented Aug 9, 2022

I can reproduce the error message putting my computer to sleep, but it reconnects in some way. Perhaps I need to leave it asleep longer. Were there any more messages? How long did you leave it after resuming? There should be a 30-second keep-alive, perhaps we can try reducing that.

@pa-sowa
Copy link
Contributor Author

pa-sowa commented Aug 10, 2022

It happens every time I put the computer to sleep for a longer time (ie. a few hours). It doesn't reconnect, neither within 30 seconds nor when I wait 30 minutes. There are no other messages in the log.

@W-Floyd
Copy link
Owner

W-Floyd commented Aug 10, 2022

Okay, thanks for the info. I'll see what I can do, started a new job last week so time is short.

@W-Floyd
Copy link
Owner

W-Floyd commented Oct 6, 2022

Sorry for not taking more of a look at this - I expect it might just be a matter of checking for a connection occasionally, and reconnecting if it goes down. If anyone wants to take a stab at it and make a PR, I'd be happy to see it.

Frankly, I haven't set this software up again since moving for my job, though I will in future. Until it bothers me, it'll probably keep being pushed down the line 😢

@dev-foo-bar
Copy link
Contributor

May be a workaround. If you can implement an exit code we can use systemd to do the magic (simple restart).

@W-Floyd W-Floyd added the bug Something isn't working label Nov 3, 2022
@phallows
Copy link
Contributor

phallows commented Nov 9, 2022

Looking into this a bit more, I can see a couple of possibilities. It might be failing on "initial connection", which would be related to SetConnectRetry (default false). Or it might be failing because it is writing during a suspend, which would be related to SetWriteTimeout (default no timeout).
I'm going to deploy these settings to machines for a few days and see if it makes a noticeable difference.

@W-Floyd
Copy link
Owner

W-Floyd commented Dec 6, 2022

How did this go @phallows ?

@phallows
Copy link
Contributor

phallows commented Dec 7, 2022

It seems to be working quite well, but I've got an issue with an unrelated part of my process that makes it hard to verify. I think its probably worth a PR so that others can test, I'll submit that as soon as I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants