Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counter request mode not reliable #73

Closed
nicx opened this issue Jul 20, 2023 · 24 comments
Closed

counter request mode not reliable #73

nicx opened this issue Jul 20, 2023 · 24 comments

Comments

@nicx
Copy link

nicx commented Jul 20, 2023

When trying to activate the counter request mode via telnet it sometimes works, sometimes not. When not working I get these messages:

P1P2/S/009 * [ESP] To ATmega: ->C1<-
P1P2/S/009 * [MON] Single counter request cycle initiated
P1P2/S/009 * [MON] T  0.024: -PE:4000B8000000000000000000000000000000000000007A readError=0x0004
P1P2/S/009 * [ESP] Uptime 1380
P1P2/S/009 * [MON] T  0.024: 4000-PE:B0-PE:10-PE:00-PE:00000000000000-PE:00-PE:00-PE:00000000-PE:4059 CRC error readError=0x0014
P1P2/S/009 * [ESP] Uptime 1390
P1P2/S/009 * [MON] T  0.024: -PE:4000B800000000000000000000000000-PE:80-PE:80F2 CRC error readError=0x0014
P1P2/S/009 * [MON] T  0.024: 4000-PE:A4-PE:01-PE:04-PE:040000-PE:08000000-PE:10-PE:04-PE:0400000000000000DA CRC error readError=0x0014
P1P2/S/009 * [ESP] Uptime 1400
P1P2/S/009 * [MON] T  0.024: -PE:40-PE:005C02-PE:00-PE:200000-PE:000000-PE:00-PE:00-PE:000000-PE:20-PE:40F2 CRC error readError=0x0014
P1P2/S/009 * [ESP] Uptime 1410
P1P2/S/009 * [MON] T  0.024: 40A0-PE:A1-PE:A8-PE:02-PE:02EAFB-PE:C3-PE:F9-PE:FF-PE:FF-PE:01-PE:01-PE:01-PE:FF-PE:FF-PE:FF-PE:0300000029 CRC error readError=0x0014
P1P2/S/009 * [ESP] Uptime 1420
P1P2/S/009 * [ESP] Uptime 1430

After the "not working case" the Auxiliary controller mode is disabled automatically, too. I cannot reenable it afterwords, I get the following error message:

P1P2/S/009 * [ESP] To ATmega: ->L1<-
P1P2/S/009 * [MON] Errorspermitted (error budget) too low; control functionality cannot enabled

As already said, sometimes the counter request mode works. I am wondering what I could do to get both modes working reliable. My Daikin model is a Altherma 3 R Ech2O EHSXB08P30E. Any hints?

@Arnold-n
Copy link
Owner

Thanks for reporting this. Can you please share a short part of the P1P2/R/# log (or the telnet output in "J10" mode? There is one other system where the main Daikin controller doesn't like the protocol violation that is necessary for the counter request. What happens is that the output with the counters from the Daikin EHSCB08P30E collides with the next request from the main Daikin wall controller (Daikin forgot to implement bus collission detection). And indeed sometimes it works, sometimes it does not, depending on exactly when the counter requests are inserted. An improved counter request insertion mechanism (works on one other system) will be released soon in v0.9.41; I hope that it will work for you too.

Disabling the auxiliary control after read errors is a safety measure to prevent the risk of bus collissions or other errors. Once the "error budget" is restored (after 10 hour of error-free operation, or after an ATmega reset), you can restart auxiliary control again. The ATmega can be reset with the "A" command.

@nicx
Copy link
Author

nicx commented Jul 26, 2023

@Arnold-n just a short feedback: with v0.9.41 P1P2Monitor it works now for several days. I still get errors sometimes, but no more disabling of aux control and counter request mode. for me this is a working solution :)

@Arnold-n
Copy link
Owner

@nicx, thanks, good to hear, though I am not happy with any remaining errors especially not if they are caused by the counter requests. Would you be able to log P1P2/R/# for a longer period so we can determine the cause of the errors? It's about 140MB/day raw data log, or 6MB/day compressed.

@nicx
Copy link
Author

nicx commented Jul 27, 2023

@Arnold-n sure I can give you logs... any hint how exactly I could get them? :)

@Arnold-n
Copy link
Owner

@nicx On a Ubuntu or Raspberry Pi system I use a MQTT client that logs all MQTT traffic to P1P2/R/xxx:
mosquitto_sub -h 192.168.4.12 -u P1P2 -P P1P2 -v -t P1P2/R/# > mqtt.log
If that does not work, I can temporarily make a MQTT server available for you that I can log myself - of course I would use a better password known to you and me only.

@nicx
Copy link
Author

nicx commented Jul 29, 2023

@Arnold-n just sent you the logfile via mail. hope it helps :)

@nicx
Copy link
Author

nicx commented Oct 17, 2023

@Arnold-n unfortunately, the "Counter request Mode" has not worked at all for some time. Knowingly I have changed nothing. not even a reset helps.
I installed the latest firmware 0.9.42 today (incl. monitor), but that didn't help either.
Ideas?

V
P1P2/S/009 * [ESP] P1P2-bridge-esp8266 v0.9.42
P1P2/S/009 * [ESP] Compiled Sep 29 2023 08:53:51
P1P2/S/009 * [ESP] E-Series
P1P2/S/009 * [ESP] ESP_hw_identifier 1
P1P2/S/009 * [ESP] Connected to MQTT server
P1P2/S/009 * [ESP] MQTT Clientname = P1P2_009
P1P2/S/009 * [ESP] MQTT User = P1P2
P1P2/S/009 * [ESP] MQTT Server = 192.168.0.1
P1P2/S/009 * [ESP] MQTT Port = 1883
P1P2/S/009 * [ESP] ESP reboot reason = 0xFF
P1P2/S/009 * [ESP] outputMode = 0x13003
P1P2/S/009 * [ESP] outputFilter = 1
P1P2/S/009 * [ESP] mqttInputByte4 = 0
P1P2/S/009 * [ESP] EEPROM version = 0
P1P2/S/009 * [ESP] To ATmega: ->V<-
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Verbose 3
P1P2/S/009 * [MON] 1970-01-01_01:09:45 P1P2Monitor-v0.9.42
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Compiled Sep 27 2023 12:49:03 +control NEWP1P2LIB E-series
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Reset cause: MCUSR=2 (ext-reset)
P1P2/S/009 * [MON] 1970-01-01_01:09:45 P1P2-ESP-interface ATmegaHwID v1.2
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Control_ID=0xF0
P1P2/S/009 * [MON] 1970-01-01_01:09:45 CONTROL_ID_DEFAULT=0x0
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Counterrepeatingrequest=1
P1P2/S/009 * [MON] 1970-01-01_01:09:45 F030DELAY=100
P1P2/S/009 * [MON] 1970-01-01_01:09:45 F03XDELAY=30
P1P2/S/009 * [MON] 1970-01-01_01:09:45 Verbosity=3
P1P2/S/009 * [MON] 1970-01-01_01:09:45 CPU_SPEED=8000000
P1P2/S/009 * [MON] 1970-01-01_01:09:45 SERIALSPEED=250000
P1P2/S/009 * [ESP] Uptime 580 at 19700101_01:09:50
C2
P1P2/S/009 * [ESP] To ATmega: ->C2<-
P1P2/S/009 * [MON] 1970-01-01_01:10:10 Repetitive requesting of counter values was already active
P1P2/S/009 * [ESP] Uptime 600 at 19700101_01:10:10

@Arnold-n
Copy link
Owner

Hi @nicx , can you please share a P1P2/R/# log of a few minutes of your system in C2 mode, both in L0 mode and in L1 mode, preferably after an ATmega reset ('A' command) to ensure writing to the bus is allowed?

@nicx
Copy link
Author

nicx commented Oct 18, 2023

@Arnold-n I just sent you the log file. With the command "L1" I get the error

P1P2/S/009 * [MON] 1970-01-01_08:59:48 No auxiliary controller answering to address 0xF1 detected, switching control functionality can be switched on (using L1)
P1P2/S/009 * [MON] 1970-01-01_08:59:48 No auxiliary controller answering to address 0xF0 detected, control functionality will restart

seems there is something wrong with the communication to the bus?

@Arnold-n
Copy link
Owner

@nicx thanks for the log. The message above may be a bit confusing but there is no error there: it reports ca 10 seconds after ATmega reboot that your system supports two auxiliary controllers 0xF0 and 0xF1. Because the EEPROM remembers that your system was in "L1" mode, it will start using auxiliary control slot 0xF0. I think it is a coincidence that your system gave this message just after you gave the L1 command.

Your log also shows that C2 works nicely: all counter values are requested and provided by your system in the raw bus data. So if the values are not reported to HA, this could be because (1) they are not changed, or (2) they are changed, but for some reason not communicated over MQTT. Is your Daikin system operating (so that counters increase)? Can you try to give the "D3" command, which causes your interfaceto re-communicate all counters even if not recently changed and to re-init all HA sensors?

@nicx
Copy link
Author

nicx commented Oct 18, 2023

ok thats weird... I can see the values again since doing the logging. I dont know if it was the Atmega reset command, or the switching of the C/L modes, or the reset with "d1" at the end of my testings.

The problem was there for some days/weeks... and yes, the daikin was working in that time. anyway, it works again and I will monitor it.

thanks a lot again for your great support!

@Arnold-n
Copy link
Owner

You're welcome - if it returns, can you log some P1P2/R/# before reset, to see what might be causing it? Would be even better to log all data on P1P2/# continuously to back-track the cause, but it is ~200MB/day of data.
One possible cause might be if you see more than 1CRC error/hour; actually there should be no CRC errors at all except for when you reset the ATmega and/or the ESP.

@nicx
Copy link
Author

nicx commented Oct 19, 2023

I was too early: after the nightly reboot of HA and MQTT the values are gone, again. "D3" didn't help. "D1" gives me some values in HA back. Very strange: In MQTT there are alle values, in HA not:

Bildschirmfoto 2023-10-19 um 07 44 28

I will do some more tests and give more feedback if there is any comprehensible behavior.

@nicx
Copy link
Author

nicx commented Oct 19, 2023

hm it seems the device/entities in HA has changed somehow. I deleted the MQTT device in HA completely, made a reboot of HA, and the device came back with newly named entities. But now all C-entities are missing completely. BTW a lot of other entities are not shown in HA either (T2, A9, etc.). Is this by design? How decides the MQTT Integration in HA which MQTT values should be populated into HA entities?

Bildschirmfoto 2023-10-19 um 08 03 10

@Arnold-n
Copy link
Owner

Thanks for the info @nicx, I found a bug (introduced in v0.9.41) which explains the issue after a D3 and possibly also after D1, it should be fixed in v0.9.43. It only effects the counters.

In v0.9.43, both D1 and D3 should give all counters back, but it may take 2 minutes for all counters to show in HA. If some are still missing, can you show which counters do not come back in HA?

In the current software architecture it is hard-coded which entities are shown in HA (using the macro "HACONFIG"). It is planned to make this easier to configure.

@nicx
Copy link
Author

nicx commented Oct 19, 2023

@Arnold-n after some more time I can see counters are updating in MQTT, but they are not appearing in HA.
I will test the behavior with the bug fixed version and report what's happening ;) thanks!

@nicx
Copy link
Author

nicx commented Oct 19, 2023

@Arnold-n wit the new version all counters came back after a "D3". I will monitor that over the time. Thanks again!

@nicx
Copy link
Author

nicx commented Oct 20, 2023

@Arnold-n one day later and the problem is back again... no counters are updated. but what I have learned: "d1" or "d3" did not help, switching "C" off and on did not help. only after an atmega reset with "A" all counters are working again immediately. anything I could do to help finding the root cause?

@Arnold-n
Copy link
Owner

@nicx can you share logs again (preferably before the "A" / "Dx" commands)? Without I am kind of blind. Did you see an error message after "C2" on P1P2/S/# ? If yes, it might be the safety mechanism that blocks writing after too many bus read errors were observed (such errors should really not be present), and we would need to find the cause for the errors; in this case it is worth looking at all logs, and at V_bus_ATmega_ADC_* values reported (which should be ~12V for _avg and small for _diff) and at Error_Budget (which should increase from 10 to 20 and should not decrease except during reset actions). If no, I am not sure yet on the cause.

@nicx
Copy link
Author

nicx commented Oct 21, 2023

@Arnold-n I just sent you the logfile. If you need more, let me know :)

@Arnold-n
Copy link
Owner

@nicx thanks. I can see a problem now.

In the log several ESP resets are visible. These ESP resets cause a state change in P1P2Monitor's counter request code making it to stop requesting counters. I'll need to look a bit deeper into the precise cause and solution.

As for the ESP resets, did you trigger all of these? I wonder if the ESP resets are caused by the missing connection to the NTP server - does your bridge indeed have no internet access for an NTP server at your location?

@nicx
Copy link
Author

nicx commented Oct 22, 2023

@Arnold-n yes, non of my IOT devices do have access to the internet. I have an internal NTP server. I did not trigger any reset during logging.

is it possible to configure the ntp server in any way? otherwise I could grant internet to the module.

@Arnold-n
Copy link
Owner

Arnold-n commented Oct 22, 2023

@nicx it seems the missing NTP server is causing the ESP resets. Even worse, it causes a low-memory situation which causes the ESP to skip some of the outgoing MQTT messages. You can see this in the MQTT output (even if is is sometimes incomplete, see below). Free memory should be around 15000, not near 6000. I can reproduce this problem by blocking internet access from my module. I do not currently know whether it is a memory leak by the TZ library or a MQTT buffer problem due to NTP-related delays (I think the latter), and how I should avoid this problem in case of a missing NTP server. I will look into that later, but have other priorities first. But let me send you a firmware version with NTP disabled - that should solve the ESP resets and restore MQTT reliability on your side. Currently the NTP server is configured in the configuration header file.

Separately I will improve the P1P2Monitor counter request code such that it isn't disturbed by ESP resets.

ESP_Mem_free = 6024
MQTT_disconnected_skipped_packets = 110
MQTT_disconnected_time_total = 6
MQTT_disconnects = 2
MQTT_messages_skipped_low_mem = 56476
MQTT_messages_skipped_not_connected = 117

Repository owner deleted a comment from nicx Oct 22, 2023
@Arnold-n
Copy link
Owner

v0.9.48 and later have an option to disable NTP requests (P33 0), and all counter request issues are hopefully solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants