New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of keep alive failed. after serval iteration. #63
Comments
It seems like after the MQTT keep alive times out, this log statement is causing the core to panic which is what we want to debug. Can you use CONFIG_ESP_SYSTEM_PANIC_GDBSTUB option described here to find out why the log is leading to crash? Network failures once in a while are okay as long as the application recovers. You can try capturing the network data, if you want to find out the cause of |
Sorry for late reply, but I was away last two days. Any way as you suggested I've run the debug with I (117478) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (117478) coreMQTT: State record updated. New state=MQTTPublishDone.
I (117478) sub_pub_unsub: Publish 14 succeeded for task "SubPub0".
I (117478) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (117498) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (117498) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (117508) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 15
I (125508) sub_pub_unsub: Unsubscribe 15 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (125508) sub_pub_unsub: Task "SubPub0" completed a loop. Delaying before next loop.
I (130518) sub_pub_unsub: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 16
I (130918) sub_pub_unsub: Subscribe 16 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (130918) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 17.
I (130928) sub_pub_unsub: Task "SubPub0" waiting for publish 17 to complete.
I (131918) coreMQTT: Publishing message to /filter/SubPub0.
E (136928) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (136928) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (136928) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (136938) sub_pub_unsub: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (136938) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 2040
I (136948) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
I (136938) core_mqtt_agent_manager: TLS connection was disconnected.
I (136958) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
I (140748) coreMQTT: MQTT connection established with the broker.
I (140748) core_mqtt_agent_manager: Session present: 1
E (140748) coreMQTT: Collision when adding PacketID=21 at index=0.
I (140768) sub_pub_unsub: coreMQTT-Agent connected.
I (140768) ota_over_mqtt: coreMQTT-Agent connected. Resuming OTA agent.
I (140768) core_mqtt_agent_manager: coreMQTT-Agent connected.
I (141768) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (141768) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (141768) sub_pub_unsub: Error or timed out waiting for ack for publish message 17. Re-attempting publish.
I (141778) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (141778) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 17.
I (141788) coreMQTT: State record updated. New state=MQTTPublishDone.
I (141808) sub_pub_unsub: Task "SubPub0" waiting for publish 17 to complete.
I (141808) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (141828) sub_pub_unsub: Publish 17 succeeded for task "SubPub0".
I (141828) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (141838) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (141848) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 18
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x4000c250 PS : 0x00060130 A0 : 0x8018b850 A1 : 0x3ffd8120
A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00007871 A5 : 0x3f42b46c
A6 : 0x00000008 A7 : 0xff000000 A8 : 0x00000000 A9 : 0x3ffd80f0
A10 : 0x00000000 A11 : 0x3f408c14 A12 : 0x3ffd8374 A13 : 0x3ffb6d1c
A14 : 0x00000000 A15 : 0x00000001 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000001 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffd
Backtrace:0x4000c24d:0x3ffd8120 0x4018b84d:0x3ffd8140 0x4018cee9:0x3ffd8450 0x4019c75d:0x3ffd8480 0x40096905:0x3ffd84b0 0x40106318:0x3ffd8500 0x40105e50:0x3ffd8530 0x40106098:0x3ffd8580 0x400d94f6:0x3ffd85b0 0x400946b5:0x3ffd85d0
0x4018b84d: _vfprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:1521
0x4018cee9: vprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vprintf.c:34 (discriminator 5)
0x4019c75d: esp_log_writev at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:189
0x40096905: esp_log_write at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:199
0x40106318: MQTTAgentCommand_Publish at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent_command_functions.c:79 (discriminator 13)
0x40105e50: processCommand at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:584
0x40106098: MQTTAgent_CommandLoop at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059
0x400d94f6: prvMQTTAgentTask at C:\mSpaceLock\mSpaceGate-fmw\build/../main/networking/mqtt/core_mqtt_agent_manager.c:520
0x400946b5: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/port/xtensa/port.c:168
ELF file SHA256: 9c1256dcb6df26f6
Entering gdb stub now.
$T0b#e6C:\Espressif\tools\xtensa-esp32-elf\esp-2021r2-patch3-8.4.0\xtensa-esp32-elf\bin\xtensa-esp32-elf-gdb.exe: warning: Couldn't determine a path for the index cache directory.
GNU gdb (crosstool-NG esp-2021r2-patch3) 9.2.90.20200913-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-host_w64-mingw32 --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from c:\mSpaceLock\mSpaceGate-fmw\build\mSpaceGate.elf...
Remote debugging using \\.\COM3
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Ignoring packet error, continuing...
Remote replied unexpectedly to 'vMustReplyEmpty': timeout |
You are absolutely right. It seems that the
Not sure that would be helpful. Let us first figure out the root cause. |
I will do my best to support this root cause investigation, I just hope that my skills will be enough to handle it. I'm still noob in embedded programming, so I need some more detailed guidelines on what should I do or what to check, if you don't mind. This catched error happened after over 2h of running the program and it was ~2600 iterations of this subpub loop. |
Hey @NightSkySK, this is a complicated issue to even find, and what you've done already is great, no need to say your skills won't be enough! Why don't you start by putting an if statement check in before the publish that checks that
I'm using I think knowing the backtrace, and the values of the other variables, could be useful for figuring out what is happening here. |
Also, are you using one of the examples or are you writing your own application? If you are writing your own application, it would be good if you can share your code. |
I think there is race condition causing the
I think we should not use the same publish ID for re-attempt. Can you try to change the code for re-attempt in if ((xCommandAcknowledged != pdTRUE) ||
(xCommandContext.xReturnStatus != MQTTSuccess) ||
(ulNotifiedValue != ulPublishMessageId))
{
ESP_LOGW(TAG,
"Error or timed out waiting for ack for publish message %" PRIu32 ".",
ulPublishMessageId);
/* Generate a new publish ID so that we can differentiate late
* acknowledgements. */
xSemaphoreTake(xMessageIdSemaphore, portMAX_DELAY);
{
++ulMessageId;
ulPublishMessageId = ulMessageId;
}
xSemaphoreGive(xMessageIdSemaphore);
ESP_LOGW(TAG,
"Re-attempting publish with new ID %" PRIu32 ".",
ulPublishMessageId);
/* Update publish ID in the command context. */
xCommandContext.ulNotificationValue = ulPublishMessageId;
} You can apply the following patch to your repo's main branch - new_pub_id.patch. |
@aggarg Thank you for the patch, it moves us in the right direction. However the new error pop up I (7822661) sub_pub_unsub: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 2389
I (7823101) sub_pub_unsub: Subscribe 2389 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (7823101) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 2390.
I (7823121) sub_pub_unsub: Task "SubPub0" waiting for publish 2390 to complete.
I (7824101) coreMQTT: Publishing message to /filter/SubPub0.
I (7825331) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (7825331) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (7825341) sub_pub_unsub: Re-attempting publish with new ID 2391.
I (7825351) coreMQTT: State record updated. New state=MQTTPublishDone.
I (7825361) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 2391.
I (7825381) sub_pub_unsub: Task "SubPub0" waiting for publish 2391 to complete. th ID 2391.
W (7825381) sub_pub_unsub: Error or timed out waiting for ack for publish message 2391.
W (7825391) sub_pub_unsub: Re-attempting publish with new ID 2392.
I (7825401) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 2392. th ID 2392.
I (7825411) sub_pub_unsub: Task "SubPub0" waiting for publish 2392 to complete.
I (7826361) coreMQTT: Publishing message to /filter/SubPub0.
I (7826591) coreMQTT: Publishing message to /filter/SubPub0.
I (7826791) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (7826791) coreMQTT: State record updated. New state=MQTTPublishDone.
I (7826791) sub_pub_unsub: Publish 2392 succeeded for task "SubPub0".
I (7826801) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (7826811) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (7826821) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (7826821) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 2393
I (7826991) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (7827001) coreMQTT: State record updated. New state=MQTTPublishDone.
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Setting breakpoint at 0x40093d2b and returning...
0x40093d2b: xTaskGenericNotify at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/tasks.c:5323 So this time after unsubscribe from topicfilter second PUBACK arrive from repeated publish msg, and this is causing Function
|
Apologies but I think my previous hypothesis was incorrect. The agent never times out a waiting ACK and therefore, the step 3 in my previous assessment cannot happen. Looking more at your code and logs, I think we are getting PUBLISH message before PUBACK and the current code incorrectly detects that as failed publish and attempts to re-publish. Would you please try the following patch - remove_id_check.patch. Note that it is only for testing and not the perfect solution. |
Please have a look on the below logs. I (6391293) sub_pub_unsub: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 1954
I (6391663) sub_pub_unsub: Subscribe 1954 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (6391673) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 1955.
I (6391683) sub_pub_unsub: Task "SubPub0" waiting for publish 1955 to complete.
I (6392673) coreMQTT: Publishing message to /filter/SubPub0.
I (6393863) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (6393863) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6393863) sub_pub_unsub: Publish 1955 succeeded for task "SubPub0".
I (6393863) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (6393883) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (6393883) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (6393893) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 1956
I (6395963) sub_pub_unsub: Unsubscribe 1956 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (6395963) sub_pub_unsub: Task "SubPub0" completed a loop. Delaying before next loop.
I (6400973) sub_pub_unsub: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 1957
I (6401453) sub_pub_unsub: Subscribe 1957 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (6401463) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 1958.
I (6401473) sub_pub_unsub: Task "SubPub0" waiting for publish 1958 to complete.
I (6402453) coreMQTT: Publishing message to /filter/SubPub0.
I (6403583) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (6403583) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (6403583) sub_pub_unsub: Publish 1958 succeeded for task "SubPub0".
I (6403593) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (6403603) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6416273) ota_over_mqtt: OTA agent suspend by timer.
W (6417273) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (6417273) AWS_OTA: OTA Agent is suspended.
I (6417273) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
����������������������������������������������������������������: OTA agent resumed by timer.
I (6538273) AWS_OTA: otaPal_GetPlatformImageState
I (6538273) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6538273) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6538273) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
I (6538583) ota_over_mqtt: Subscribed to topic $aws/things/GA965F0001/jobs/notify-next.
I (6538583) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA965F0001/jobs/notify-next
I (6539583) coreMQTT: Publishing message to $aws/things/GA965F0001/jobs/$next/get.
I (6540703) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (6540703) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6540703) ota_over_mqtt: Sent PUBLISH packet to broker $aws/things/GA965F0001/jobs/$next/get to broker.
I (6540713) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (6540723) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6540723) AWS_OTA: Current State=[WaitingForJob], Event=[RequestJobDocument], New state=[WaitingForJob]
I (6540733) ota_over_mqtt: Received job message callback, size 54.
I (6540753) AWS_OTA: No active job available in received job document: OtaJobParseErr_t=OtaJobParseErrNoActiveJobs
I (6540763) ota_over_mqtt: Received OtaJobEventNoActiveJob callback from OTA Agent.
I (6540763) AWS_OTA: otaPal_GetPlatformImageState
I (6540773) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6540783) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6540783) ota_over_mqtt: OTA Event processing completed. Freeing the event buffer to pool.
I (6540793) AWS_OTA: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]
I (6568283) ota_over_mqtt: OTA agent suspend by timer.
W (6569283) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (6569283) AWS_OTA: OTA Agent is suspended.
I (6569283) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
����������������������������������������������������������������: OTA agent resumed by timer.
I (6690283) AWS_OTA: otaPal_GetPlatformImageState
I (6690283) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6690283) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6690283) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
I (6690743) ota_over_mqtt: Subscribed to topic $aws/things/GA965F0001/jobs/notify-next.
I (6690753) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA965F0001/jobs/notify-next
I (6691753) coreMQTT: Publishing message to $aws/things/GA965F0001/jobs/$next/get.
I (6692873) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (6692873) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6692873) ota_over_mqtt: Sent PUBLISH packet to broker $aws/things/GA965F0001/jobs/$next/get to broker.
I (6692873) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (6692883) AWS_OTA: Current State=[WaitingForJob], Event=[RequestJobDocument], New state=[WaitingForJob]
I (6692903) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6692913) ota_over_mqtt: Received job message callback, size 54.
I (6692923) AWS_OTA: No active job available in received job document: OtaJobParseErr_t=OtaJobParseErrNoActiveJobs
I (6692933) ota_over_mqtt: Received OtaJobEventNoActiveJob callback from OTA Agent.
I (6692933) AWS_OTA: otaPal_GetPlatformImageState
I (6692943) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6692953) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6692953) ota_over_mqtt: OTA Event processing completed. Freeing the event buffer to pool.
I (6692963) AWS_OTA: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]
I (6720293) ota_over_mqtt: OTA agent suspend by timer.
W (6721293) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (6721293) AWS_OTA: OTA Agent is suspended.
I (6721293) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
����������������������������������������������������������������: OTA agent resumed by timer.
I (6842293) AWS_OTA: otaPal_GetPlatformImageState
I (6842293) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6842293) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6842293) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
I (6842593) ota_over_mqtt: Subscribed to topic $aws/things/GA965F0001/jobs/notify-next.
I (6842593) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA965F0001/jobs/notify-next
I (6843593) coreMQTT: Publishing message to $aws/things/GA965F0001/jobs/$next/get.
I (6844743) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (6844743) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6844743) ota_over_mqtt: Sent PUBLISH packet to broker $aws/things/GA965F0001/jobs/$next/get to broker.
I (6844743) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (6844753) AWS_OTA: Current State=[WaitingForJob], Event=[RequestJobDocument], New state=[WaitingForJob]
I (6844773) coreMQTT: State record updated. New state=MQTTPublishDone.
I (6844783) ota_over_mqtt: Received job message callback, size 54.
I (6844793) AWS_OTA: No active job available in received job document: OtaJobParseErr_t=OtaJobParseErrNoActiveJobs
I (6844803) ota_over_mqtt: Received OtaJobEventNoActiveJob callback from OTA Agent.
I (6844803) AWS_OTA: otaPal_GetPlatformImageState
I (6844813) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (6844823) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (6844823) ota_over_mqtt: OTA Event processing completed. Freeing the event buffer to pool.
I (6844833) AWS_OTA: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]
I (6872303) ota_over_mqtt: OTA agent suspend by timer.
W (6873303) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (6873303) AWS_OTA: OTA Agent is suspended. |
Can you break the code in the debugger and see what this pub-sub task is doing? As it is taking longer to debug, would you like to have a debug session to debug this together? If yes, please share your email id. |
@NightSkySK was your issue resolved? Can you respond to @aggarg's post above? Also, can you stop the OTA task (or better yet not start it altogether)? That will reduce the noise in the logs. Just keep the MQTT sub_pub_unsub task running. |
@AniruddhaKanhere I was investigating and debugging this issue with @aggarg. Thanks to Gaurav's support we've managed to identify the root cause and confirmed the effectiveness of the solution provided by Gaurav. |
The current implementation uses task notification to wait for acknowledgements and incoming publishes and therefore, cannot differentiate between acknowledgements and incoming publishes. This causes a crash when an incoming PUBLISH is received before PUBACK for the previous outgoing publish. This commit uses an event group to wait for acknowledgements and incoming publishes. This allows us to differentiate between acknowledgements and incoming publishes by waiting for different bits. This was reported here - FreeRTOS#63. Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Describe the bug
I'm tring to understand nature of the failure and how to fix it for 2 weeks and I can't find the right solution.
After number of itteration I get the status
MQTTKeepAliveTimeout
Some improvement I get rising the priority up to 6 for
CONFIG_GRI_MQTT_AGENT_TASK_PRIORITY
I've also played with some other settings without major change.
I've traced task stack_size to make sure I'm not exceed there, but it seems to be fine.
I've went trough Open and Closed issues in this repo but I've not found any hint that someone faced similar issue to mine.
System information
Expected behavior
Keep MQTT connected for few hours not only for few minutes
Steps to reproduce bug
Example:
1. "I am using project [ ... ], and have configured with [ ... ]"
2. "When run on [ ... ], I observed that [ ... ]"
Console trace
Additional context
As a side question I wonder why in this example we start demo tasks before starting
WiFi
andcoreMQTT-Agent
network managerI found another example where it's done other way around like github.com/aws-samples/device-software-for-aws-connected-mobility-solution What are the pros and cons of those two approach?
The text was updated successfully, but these errors were encountered: