Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mqtt reconnect timeout #74

Closed
nicx opened this issue Jul 23, 2023 · 11 comments
Closed

mqtt reconnect timeout #74

nicx opened this issue Jul 23, 2023 · 11 comments

Comments

@nicx
Copy link

nicx commented Jul 23, 2023

hey, i am using the module now for a few days. in general it works good :)

my problem: every night, i stop all my docker containers (including mqtt server container and ha docker container) for backup purposes. after the backup they are restartet. the duration for this is 10 to 15 minutes.

the p1p2serial module is not reconnecting at all, all sensors are "unknown" in ha. after the command "d1" via telnet, all is working fine again.

is there any reconnection implemented in the software? if yes: is the a maximum timeout? could this be configured in any way?

@Arnold-n
Copy link
Owner

Glad it works usually! And I think it should reconnect automatically even in your situation. It reconnects here automatically even after a longer MQTT server timeout.

As long as WiFi remains available, the ESP will try to reconnect to the MQTT server every 5 seconds; if that doesn't succeed within 150 seconds, the ESP will restart (=D0), after which it will continue its reconnection attempts every 5 seconds, and it will also restart every 150 seconds. So there is no maximum timeout. And even without WiFi, the portal that will initiate after an ESP restart will timeout after 180 seconds to enable a WiFi reconnection attempt. So I do not yet understand why it does not reconnect on your side. (and at this moment the maximum timeout (or reattempt time) can only be changed in the configuration header file but this requires recompilation).

Can you log and check what the ESP reports over telnet during and after the MQTT server timeout? This may require restarting your telnet sessions several times as the telnet sessions end when the ESP restarts. Are you sure it does not connect, or does it connect but is it not publishing new MQTT info?

@nicx
Copy link
Author

nicx commented Jul 24, 2023

@Arnold-n thanks for your explanation. and you are right: the mitt connection reconnects, just checked it after the last night. I can see some values publishing, but not all of them (eg the C- and S-values are no more published). I am wondering why in HA all sensors are unknown... even the still published ones (T-values). As already written, after a reboot everything works again without any extra configuration needed.

Bildschirmfoto 2023-07-24 um 07 47 24 Bildschirmfoto 2023-07-24 um 07 47 37

vie telnet I can see only this messages:

P1P2/S/009 * [ESP] Uptime 47470
P1P2/S/009 * [ESP] Uptime 47480
P1P2/S/009 * [ESP] Uptime 47490
P1P2/S/009 * [ESP] Uptime 47500
P1P2/S/009 * [ESP] Uptime 47510
P1P2/S/009 * [ESP] Uptime 47520
P1P2/S/009 * [ESP] Uptime 47530
P1P2/S/009 * [ESP] Uptime 47540

anything I could do to see more? ;)

@Arnold-n
Copy link
Owner

Thanks for your fast feedback, I think 2 different issues are involved:
-you mention you also restart HA, so I guess HA requires the sensor initialization messages, which are "retained" messages on the MQTT server - but these are lost because the MQTT server was also restarted. These messages are currently not re-transmitted by the ESP after a MQTT reconnection, and
-the C and S values (any unchanged value) will only be published after a change and these values change infrequently.

But if the ESP restarted at least once during the MQTT server unavailability, both HA init messages and all C/S values should be retransmitted; are you sure your MQTT server is unavailable for more than 150 seconds?

An quick solution would be to (add an option to) restart the ESP after each MQTT reconnect (similar to your D1 restart) or after a MQTT reconnect longer than a certain period (currently effectively already implemented as the 150s restart timer). Or the ESP should resend all HA sensor initialization messages and all data values after each MQTT reconnect. I am not sure it is the best solution to retransmit a lot of data after each MQTT reconnect as it will be a burden for less reliable networks (MQTT reconnects signal a server restart or a network problem).

@nicx
Copy link
Author

nicx commented Jul 24, 2023

@Arnold-n I just checked the logs of my NAS in detail. You are again right: the Matt server ist only unavailable for about seconds. the HA instance is unavailable for some more seconds :)

[24.07.2023 02:08:16][mosquitto][info] Stopping mosquitto... 
[24.07.2023 02:08:17][mosquitto][info] done! (took 1 seconds)
[24.07.2023 02:08:17][mosquitto][info] Should NOT backup external volumes, sanitizing them...
[24.07.2023 02:08:17][mosquitto][info] Calculated volumes to back up: /mnt/user/appdata/mosquitto
[24.07.2023 02:08:17][mosquitto][info] Backing up mosquitto...
[24.07.2023 02:08:17][mosquitto][info] Backup created without issues
[24.07.2023 02:08:17][mosquitto][warning] Skipping verification for this container because its not wanted!
[24.07.2023 02:08:17][mosquitto][info] Starting mosquitto... (try #1)
[24.07.2023 02:08:20][zigbee2mqtt][info] Stopping zigbee2mqtt... 
[24.07.2023 02:08:27][zigbee2mqtt][info] done! (took 7 seconds)
[24.07.2023 02:08:27][zigbee2mqtt][info] Should NOT backup external volumes, sanitizing them...
[24.07.2023 02:08:27][zigbee2mqtt][info] Calculated volumes to back up: /mnt/user/appdata/zigbee2mqtt
[24.07.2023 02:08:27][zigbee2mqtt][info] Backing up zigbee2mqtt...
[24.07.2023 02:08:28][zigbee2mqtt][info] Backup created without issues
[24.07.2023 02:08:28][zigbee2mqtt][warning] Skipping verification for this container because its not wanted!
[24.07.2023 02:08:28][zigbee2mqtt][info] Starting zigbee2mqtt... (try #1)
[...]
[24.07.2023 02:08:58][homeassistant][info] Stopping homeassistant... 
[24.07.2023 02:09:09][homeassistant][info] done! (took 11 seconds)
[24.07.2023 02:09:09][homeassistant][info] Should NOT backup external volumes, sanitizing them...
[24.07.2023 02:09:09][homeassistant][info] Calculated volumes to back up: /mnt/user/appdata/homeassistant
[24.07.2023 02:09:09][homeassistant][info] Backing up homeassistant...
[24.07.2023 02:09:09][homeassistant][info] Backup created without issues
[24.07.2023 02:09:09][homeassistant][warning] Skipping verification for this container because its not wanted!
[24.07.2023 02:09:09][homeassistant][info] Starting homeassistant... (try #1)

so as far as I understand you, I need to restart the ESP manually (or via an automation in HA which sends the "D1" command via telnet), as long as the ESP doesn't send its init messages again after a mutt reconnect. I will try that as a workaround.

Ist it possible to just send the init messages without all other data messages after reconnection? or maybe could I retain the init messages via the server?

@Arnold-n
Copy link
Owner

Arnold-n commented Jul 24, 2023

Indeed, and I can add it as an option for a next release.

Based on this topic storing retained messages on the server over a restart is possible via the persistence option.

Resending the init messages after a reconnect is possible but also a lot of data, so the data throttling after a restart is needed, can be done but takes a bit more work than just a restart.

@nicx
Copy link
Author

nicx commented Jul 24, 2023

@Arnold-n ok I created a short python script as workaround:

#!/usr/bin/python3

import sys
import telnetlib

HOST = "192.168.1.9"
PORT = "23"
TIMEOUT = 5

tn = telnetlib.Telnet(HOST,PORT,TIMEOUT)
tn.read_until(b"Uptime")
tn.write(b"D1\n")
tn.close()

in addition I created an automation in HA which calls that script triggers by the HA restart itself. Will see if it works as expected this night and give feedback :)

anyway, it would be great if you implement a solution directly in the firmware ;)

@fuecy
Copy link

fuecy commented Jul 24, 2023

Just my 2 cents.

If you are using mosquito, you can save retained messages to disk which make them available after reboot of mosquito. See https://mosquitto.org/man/mosquitto-conf-5.html autosave_on_changes.

@Arnold-n, as an alternative, you could potentially listen to the birth message of home assistant on the mqtt, to send the ha sensor configuration again.

@Arnold-n
Copy link
Owner

@fuecy, thanks, listening to the birth message makes sense anyway, thanks for the suggestion!

@gmcmicken
Copy link
Contributor

I meant to report this as well a few months back! Good to see some solutions presented

@nicx
Copy link
Author

nicx commented Jul 26, 2023

just a short feedback: both methods work as a workaround. you could use a script for resetting the p1p2 module, or you could save retained messages on persistent storage with mosquitto.

@Arnold-n
Copy link
Owner

Arnold-n commented Aug 6, 2023

v0.9.41, just released, adds a few solutions:
-if /homeassistant/status online is received, parameter reporting/throttling is restarted from scratch (thanks @fuecy!)
-if command D3 is received, parameter reporting/throttling is restarted
-[deleted]
-if MQTT reconnects AND j-mask includes 0x0800 [edit: for v0.9.46 and later, was: 0x80000], data reporting/throttling is restarted from scratch

Thanks again for your feedback which triggered these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants