Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on MQTT keepalive timeout #77

Open
GwnDaan opened this issue Jul 12, 2023 · 11 comments
Open

Deadlock on MQTT keepalive timeout #77

GwnDaan opened this issue Jul 12, 2023 · 11 comments

Comments

@GwnDaan
Copy link

GwnDaan commented Jul 12, 2023

Hi there!

I experienced my MQTT publisher application hanging after it get's disconnected by the broker for "keepalive timeout". The application hangs on lwgsm_mqtt_client_api_publish.

The line where I think the deadlock is: https://github.com/MaJerle/lwgsm/blob/94dfb5067234d0da257bceee01267cee8d0fa5de/lwgsm/src/apps/mqtt/lwgsm_mqtt_client_api.c#L411

I have a feeling it may be caused by the doubly taken semaphore, where the second one is the lwgsm_sys_sem_wait in lwgsm_mqtt_client_api_close, but that's a complete guess.
https://github.com/MaJerle/lwgsm/blob/94dfb5067234d0da257bceee01267cee8d0fa5de/lwgsm/src/apps/mqtt/lwgsm_mqtt_client_api.c#L318

Although the application shouldn't deadlock, I also could't figure out why it times out the connection for "keepalive timeout"... It seems to consistently occur after the connection is open for 1 hour. The PINGREQ packets seem to receive the PINGRESP correctly, and also the PUBLISH packets seem to receive the PUBACK correctly. Does anything else need to happen?

System information:
System: FreeRTOS
MQTT broker: RabbitMQ

Code to reproduce:

#include "lwgsm/lwgsm_network_api.h"
#include "lwgsm/lwgsm.h"
#include "lwgsm/lwgsm_mem.h"
#include "network_utils.h"
#include "lwgsm/apps/lwgsm_mqtt_client_api.h"
#include "packet.h"

#include <driver/gpio.h>
#include <rom/crc.h>
#include <esp_log.h>

static lwgsmr_t lwgsm_callback_func(lwgsm_evt_t *evt);

#define NETWORK_APN "globaldata.iot"
#define NETWORK_APN_USER ""
#define NETWORK_APN_PASS ""

#define PWR_PIN GPIO_NUM_4

static const lwgsm_mqtt_client_info_t mqtt_client_info = {
    .keep_alive = 10,

    /* Server login data */
    .user = "test",
    .pass = "test",

    /* Device identifier address */
    .id = "pomp",
};

static const char mqtt_topic_str[] = "stationair-pomp";

int app_main()
{
    // Power on SIM7000C
    lwgsm_delay(2000);
    ESP_ERROR_CHECK(gpio_set_direction(PWR_PIN, GPIO_MODE_OUTPUT));
    ESP_ERROR_CHECK(gpio_set_level(PWR_PIN, 0));
    lwgsm_delay(10);
    ESP_ERROR_CHECK(gpio_set_level(PWR_PIN, 1));
    lwgsm_delay(1010);
    ESP_ERROR_CHECK(gpio_set_level(PWR_PIN, 0));
    lwgsm_delay(4500);

    ESP_LOGI("Main", "Starting GSM application!");

    /* Initialize GSM with default callback function */
    if (lwgsm_init(lwgsm_callback_func, 1) != lwgsmOK)
    {
        ESP_LOGE("Main", "Cannot initialize LwGSM, restarting in 10 seconds...");
        lwgsm_delay(10000);
        esp_restart();
        return 0;
    }

    /* Set APN credentials */
    lwgsm_network_set_credentials(NETWORK_APN, NETWORK_APN_USER, NETWORK_APN_PASS);

    /* Start main (MQTT) thread */
    lwgsm_mqtt_client_api_p client;
    lwgsm_mqtt_conn_status_t conn_status;
    lwgsmr_t res;

    pump_packet_t packet;
    memset(&packet, 0, sizeof(packet));

    /* Request network attach */
    while (lwgsm_network_request_attach() != lwgsmOK)
    {
        lwgsm_delay(1000);
    }

    /* Create new MQTT API */
    if ((client = lwgsm_mqtt_client_api_new(256, 128)) == NULL)
    {
        ESP_LOGE("Main", "Cannot allocate new MQTT client API, restarting in 10 seconds");
        lwgsm_delay(10000);
        esp_restart();
        return 0;
    }

    while (1)
    {
        /* Make a connection */
        ESP_LOGI("mqtt_client_api", "Joining MQTT server");

        /* Try to join */
        conn_status = lwgsm_mqtt_client_api_connect(client, "broker.somewhere.org", 1883, &mqtt_client_info);
        if (conn_status == LWGSM_MQTT_CONN_STATUS_ACCEPTED)
        {
            ESP_LOGI("mqtt_client_api", "Connected and accepted!");
            ESP_LOGI("mqtt_client_api", "Client is ready to subscribe and publish to new messages");
        }
        else
        {
            ESP_LOGW("mqtt_client_api", "Connect API response: %d", (int)conn_status);
            if (lwgsm_mqtt_client_api_close(client) != lwgsmOK)
            {
                ESP_LOGE("mqtt_client_api", "Cannot disconnect from MQTT server, restarting in 10 seconds");
                lwgsm_delay(10000);
                // esp_restart();
                return 0;
            }
            lwgsm_delay(5000);
            continue;
        }

        while (1)
        {
            /* Insert live data in packet */
            // ...
            // ...
            // ...
            packet.crc = crc16_le(0, (uint8_t const *)&packet, sizeof(packet) - sizeof(packet.crc));

            /* Publish data */
            res = lwgsm_mqtt_client_api_publish(client, mqtt_topic_str, &packet, PUMP_PACKET_SIZE,
                                                LWGSM_MQTT_QOS_AT_LEAST_ONCE, 1);
            if (res != lwgsmOK)
            {
                ESP_LOGW("mqtt_client_api", "Problem publishing data to topic!");
                break;
            }
            else
            {
                ESP_LOGI("mqtt_client_api", "Published data {%u, %u, %f, %f, %f, %u}", packet.running_voltage, packet.battery_voltage, packet.longitude, packet.latitude, packet.speed, packet.crc);
            }

            lwgsm_delay(60000); /* Delay 60 seconds */
        }
    }

    return 0;
}

static lwgsmr_t lwgsm_callback_func(lwgsm_evt_t *evt)
{
    switch (lwgsm_evt_get_type(evt))
    {
    case LWGSM_EVT_INIT_FINISH:
        ESP_LOGI("Main", "Library initialized!");
        break;
    case LWGSM_EVT_NETWORK_REG_CHANGED:
        network_utils_process_reg_change(evt);
        break;
    case LWGSM_EVT_NETWORK_OPERATOR_CURRENT:
        network_utils_process_curr_operator(evt);
        break;
    case LWGSM_EVT_SIGNAL_STRENGTH:
        network_utils_process_rssi(evt);
        break;

    default:
        break;
    }
    return lwgsmOK;
}

Let me know your thoughts!

@MaJerle
Copy link
Owner

MaJerle commented Jul 12, 2023

Thanks for sharing. Dead lockshould indeed not occur, so if it does, it is a bug.

1hour rule, is broker disconnecting you regardless of activity after 1h?

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

It seems to consistently occur after the connection is open for 1 hour

Okay. it does vary somewhat when it times out, according to RabbitMQ logs:

2023-07-12 12:22:02.190 [info] <0.25946.18> accepting MQTT connection <0.25946.18> (xx.xx.xx.xx:54254 -> xx.xx.xx.xx:1883, client id: pomp)
2023-07-12 13:02:22.432 [error] <0.25946.18> closing MQTT connection "xx.xx.xx.xx:54254 -> xx.xx.xx.xx:1883" (keepalive timeout)
.....
13:30:35.145 [info] <0.30336.18> accepting MQTT connection <0.30336.18> (xx.xx.xx.xx:35659 -> xx.xx.xx.xx:1883, client id: pomp)
2023-07-12 14:22:05.455 [error] <0.30336.18> closing MQTT connection "xx.xx.xx.xx:35659 -> xx.xx.xx.xx:1883" (keepalive timeout)

Most of the times it is around 1 hour. Never longer.

1hour rule, is broker disconnecting you regardless of activity after 1h?

I'm not sure if I understand your question correctly, but I keep publishing a message every 60 seconds, and the keepalive is set to 10 seconds (the same as the mqtt example of this library). So yes, there should still be activity on the connection when the broker disconnects the client.

I have some debug logs of the mqtt client when 'disaster' strikes:

[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PUBLISH to output buffer
[LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 60
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,4:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0x40
[LWGSM MQTT] Remaining length received: 2 bytes
[LWGSM MQTT] Processing packet type PUBACK
[MQTT API] Publish event with response: 0
I (3564316) mqtt_client_api: Published data {1, 1, 0.000000, 0.000000, 0.000000, 49845}
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND OK", is_ok: 1
Cmd NULL, Rx: "+RECEIVE,5,2:", is_ok: 0
[LWGSM MQTT] Parser init state, received first byte of packet 0xD0
[LWGSM MQTT] Remaining length received: 0 bytes
[LWGSM MQTT] Processing packet type PINGRESP
[LWGSM MQTT] Ping response received
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
[LWGSM MQTT] Writing packet type PUBLISH to output buffer
[LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 61
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND FAIL", is_ok: 0
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
Cmd 133, Rx: "5, SEND FAIL", is_ok: 0
[LWGSM MQTT] Writing packet type PINGREQ to output buffer
[LWGSM MQTT] Sending PINGREQ packet
[LWGSM MQTT] Failed to send 0 bytes. Manually closing down..
Cmd NULL, Rx: "5, SEND FAIL", is_ok: 0
Cmd NULL, Rx: "DST: 1", is_ok: 0
Cmd NULL, Rx: "*PSUTTZ: 23/07/12,12:24:20","+08",1", is_ok: 0

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

Hmm, after letting it sit in the deadlock state for another 10 minutes or so, it got out of it somehow:

Cmd NULL, Rx: +PDP: DEACT
, is_ok: 0
Cmd 142, Rx: OK
, is_ok: 0
Cmd 142, Rx: STATE: PDP DEACT
, is_ok: 0
Cmd 142, Rx: C: 0,,"","","","INITIAL"
, is_ok: 0
Cmd 142, Rx: C: 1,,"","","","INITIAL"
, is_ok: 0
Cmd 142, Rx: C: 2,,"","","","INITIAL"
, is_ok: 0
Cmd 142, Rx: C: 3,,"","","","INITIAL"
, is_ok: 0
Cmd 142, Rx: C: 4,,"","","","INITIAL"
, is_ok: 0
[MQTT API] Disconnect event
[MQTT API] Publish event with response: 2
Cmd 142, Rx: C: 5,0,"TCP","148.251.189.186","1883","CLOSED"
, is_ok: 1
TX: 142
Cmd 0, Rx: C: 6,,"","","","INITIAL"
, is_ok: 0
Cmd NULL, Rx: C: 7,,"","","","INITIAL"
, is_ok: 0
567W (4246546) mqtt_client_api: Problem publishing data to topic!
I (4246546) mqtt_client_api: Joining MQTT server
[MQTT API] Cannot connect to 148.251.189.186
W (4246556) mqtt_client_api: Connect API response: 256
[MQTT API] Cannot close API connection
E (4246566) mqtt_client_api: Cannot disconnect from MQTT server, restarting in 10 second

Though ~10 minutes is quite long for recognizing it cannot interact with the MQTT server...

@MaJerle
Copy link
Owner

MaJerle commented Jul 12, 2023

It would also help to see raw AT commands data. with Echo enabled. Can you do it?

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

Sure! Give me another hour as I have to wait before it times out again haha

@MaJerle
Copy link
Owner

MaJerle commented Jul 12, 2023

If you use termite terminal for UART, please enable timestamp for both, AT port and your library debug port, so that we can match timings from both logs. Attach both files

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

Here are the logs with timestamp since boot for the ESP32: (I unfortunately don't have a way of directly monitoring UART between the ESP32 and SIM7000G today, the best I could come up with is let the ESP32 report the AT communications)

01:00:46.766: [LWGSM MQTT] Writing packet type PUBLISH to output buffer
01:00:46.767: [LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 60
01:00:46.982: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:00:46.985: Cmd 133, is_ok: 0, Rx: +RECEIVE,5,2:
01:00:46.986: [LWGSM MQTT] Parser init state, received first byte of packet 0xD0
01:00:46.988: [LWGSM MQTT] Remaining length received: 0 bytes
01:00:46.993: [LWGSM MQTT] Processing packet type PINGRESP
01:00:46.999: [LWGSM MQTT] Ping response received
01:00:47.003: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,41
01:00:47.282: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:00:47.284: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,4:
01:00:47.285: [LWGSM MQTT] Parser init state, received first byte of packet 0x40
01:00:47.288: [LWGSM MQTT] Remaining length received: 2 bytes
01:00:47.294: [LWGSM MQTT] Processing packet type PUBACK
01:00:47.299: [MQTT API] Publish event with response: 0
I (3564661) mqtt_client_api: Published data {1, 1, 0.000000, 0.000000, 0.000000, 49845}
01:00:56.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:00:56.997: [LWGSM MQTT] Sending PINGREQ packet
01:00:57.001: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:00:57.565: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:00:57.567: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,2:
01:00:57.567: [LWGSM MQTT] Parser init state, received first byte of packet 0xD0
01:00:57.571: [LWGSM MQTT] Remaining length received: 0 bytes
01:00:57.577: [LWGSM MQTT] Processing packet type PINGRESP
01:00:57.582: [LWGSM MQTT] Ping response received
01:01:07.496: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:07.497: [LWGSM MQTT] Sending PINGREQ packet
01:01:07.501: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:01:07.924: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:01:07.925: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,2:
01:01:07.926: [LWGSM MQTT] Parser init state, received first byte of packet 0xD0
01:01:07.929: [LWGSM MQTT] Remaining length received: 0 bytes
01:01:07.935: [LWGSM MQTT] Processing packet type PINGRESP
01:01:07.940: [LWGSM MQTT] Ping response received
01:01:17.496: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:17.497: [LWGSM MQTT] Sending PINGREQ packet
01:01:17.502: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:01:17.906: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:01:17.907: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,2:
01:01:17.908: [LWGSM MQTT] Parser init state, received first byte of packet 0xD0
01:01:17.912: [LWGSM MQTT] Remaining length received: 0 bytes
01:01:17.917: [LWGSM MQTT] Processing packet type PINGRESP
01:01:17.922: [LWGSM MQTT] Ping response received
01:01:27.496: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:27.497: [LWGSM MQTT] Sending PINGREQ packet
01:01:27.501: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:01:28.005: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:01:28.007: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,2:
01:01:28.007: [LWGSM MQTT] Parser init state, received first byte of packet 0xD0
01:01:28.011: [LWGSM MQTT] Remaining length received: 0 bytes
01:01:28.016: [LWGSM MQTT] Processing packet type PINGRESP
01:01:28.022: [LWGSM MQTT] Ping response received
01:01:37.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:37.997: [LWGSM MQTT] Sending PINGREQ packet
01:01:38.002: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:01:47.306: [LWGSM MQTT] Writing packet type PUBLISH to output buffer
01:01:47.307: [LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 61
01:01:47.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:47.997: [LWGSM MQTT] Sending PINGREQ packet
01:01:57.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:01:57.996: [LWGSM MQTT] Sending PINGREQ packet
01:01:59.024: Cmd 133, is_ok: 0, Rx: 5, SEND FAIL
01:01:59.026: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:02:07.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:02:07.997: [LWGSM MQTT] Sending PINGREQ packet
01:02:17.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:02:17.996: [LWGSM MQTT] Sending PINGREQ packet
01:02:20.049: Cmd 133, is_ok: 0, Rx: 5, SEND FAIL
01:02:20.051: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:02:27.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:02:27.997: [LWGSM MQTT] Sending PINGREQ packet
01:02:37.996: [LWGSM MQTT] Failed to send 0 bytes. Manually closing down..
01:02:38.000: Cmd 136, is_ok: 0, Rx: AT+CIPCLOSE=5
01:02:41.070: Cmd NULL, is_ok: 0, Rx: 5, SEND FAIL
01:03:55.836: Cmd NULL, is_ok: 0, Rx: DST: 1
01:03:55.839: Cmd NULL, is_ok: 0, Rx: *PSUTTZ: 23/07/12,14:30:43","+08",1
01:11:57.458: Cmd NULL, is_ok: 0, Rx: +PDP: DEACT
01:11:57.462: Cmd 142, is_ok: 0, Rx: AT+CIPSTATUS
01:11:57.462: Cmd 142, is_ok: 0, Rx: OK
01:11:57.464: Cmd 142, is_ok: 0, Rx: STATE: PDP DEACT
01:11:57.466: Cmd 142, is_ok: 0, Rx: C: 0,,"","","","INITIAL"
01:11:57.470: Cmd 142, is_ok: 0, Rx: C: 1,,"","","","INITIAL"
01:11:57.476: Cmd 142, is_ok: 0, Rx: C: 2,,"","","","INITIAL"
01:11:57.481: Cmd 142, is_ok: 0, Rx: C: 3,,"","","","INITIAL"
01:11:57.487: Cmd 142, is_ok: 0, Rx: C: 4,,"","","","INITIAL"
01:11:57.493: [MQTT API] Disconnect event
01:11:57.496: [MQTT API] Publish event with response: 2
01:11:57.501: Cmd 142, is_ok: 1, Rx: C: 5,0,"TCP","148.251.189.186","1883","CLOSED"
01:11:57.509: Cmd 0, is_ok: 0, Rx: C: 6,,"","","","INITIAL"
01:11:57.514: Cmd 0, is_ok: 0, Rx: C: 7,,"","","","INITIAL"
W (4234881) mqtt_client_api: Problem publishing data to topic!
I (4234881) mqtt_client_api: Joining MQTT server
01:11:57.531: [MQTT API] Cannot connect to 148.251.189.186
W (4234901) mqtt_client_api: Connect API response: 256
01:11:57.542: [MQTT API] Cannot close API connection
E (4234911) mqtt_client_api: Cannot disconnect from MQTT server, restarting in 10 seconds

From this I think we can make the following conclusions:

  • when the PINGREQ packet receives SEND FAIL it just ignores it.
  • when the PUBLISH packet doesn't come through it recognizes it in 50 seconds: "Failed to send 0 bytes. Manually closing down..", but then it tries to send the TCP close packet. But that one never goes through?
  • 01:03:55.839: Cmd NULL, is_ok: 0, Rx: *PSUTTZ: 23/07/12,14:30:43","+08",1
    01:11:57.458: Cmd NULL, is_ok: 0, Rx: +PDP: DEACT
    
    After roughly 8 minutes of no responses, the SIM7000G socket decides to send the +PDP: DEACT message and then communication is somehow restored. But only the connection can then not be re-established... And a restart is required.

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

This is probably the reason why the application can't reconnect to the MQTT server after it received +PDP: DEACT, but it doesn't explain why it loses connection in the first place.
image
image

SIM7000 Series_AT Command Manual_V1.06.pdf

@MaJerle
Copy link
Owner

MaJerle commented Jul 12, 2023

This is funny part:

01:02:20.051: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:02:27.996: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:02:27.997: [LWGSM MQTT] Sending PINGREQ packet
01:02:37.996: [LWGSM MQTT] Failed to send 0 bytes. Manually closing down..
01:02:38.000: Cmd 136, is_ok: 0, Rx: AT+CIPCLOSE=5
01:02:41.070: Cmd NULL, is_ok: 0, Rx: 5, SEND FAIL

Before Send fail, there was CIPCLOSE command. Can you increase a timeout for lwgsm_conn_send API function, to higher value than written in the AT commands datasheet ?

@GwnDaan
Copy link
Author

GwnDaan commented Jul 12, 2023

The AT+CIPSEND=5,2 and it's corresponding SEND FAIL is for the keepalive message, while the AT+CIPCLOSE=5 is send for the PUBLISH message. So I don't think that will make much of a difference as the timeout will only delay one or the other and there is still a chance they will overlap.

I tried increasing the keepalive to 60 seconds, and that results into the library sometimes being able to recover from the disconnect:

01:02:07.766: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:02:07.766: [LWGSM MQTT] Sending PINGREQ packet
01:02:07.771: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:02:09.216: [LWGSM MQTT] Writing packet type PUBLISH to output buffer
01:02:09.217: [LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 61
01:02:28.794: Cmd 133, is_ok: 0, Rx: 5, SEND FAIL
01:02:28.796: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:02:49.817: Cmd 133, is_ok: 0, Rx: 5, SEND FAIL
01:02:49.819: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,2
01:03:06.766: [LWGSM MQTT] Writing packet type PINGREQ to output buffer
01:03:06.766: [LWGSM MQTT] Sending PINGREQ packet
01:03:07.766: [LWGSM MQTT] Failed to send 0 bytes. Manually closing down..
01:03:07.770: Cmd 136, is_ok: 0, Rx: AT+CIPCLOSE=5
01:03:10.837: Cmd NULL, is_ok: 0, Rx: 5, SEND FAIL
01:03:54.945: Cmd NULL, is_ok: 0, Rx: DST: 1
01:03:54.948: Cmd NULL, is_ok: 0, Rx: *PSUTTZ: 23/07/12,19:06:37","+08",1
01:04:07.537: [MQTT API] Disconnect event
01:04:07.537: [MQTT API] Publish event with response: 2
01:04:07.538: Cmd NULL, is_ok: 0, Rx: 5, CLOSED
W (3764901) mqtt_client_api: Problem publishing data to topic!
I (3764911) mqtt_client_api: Joining MQTT server
01:04:07.555: Cmd 142, is_ok: 0, Rx: AT+CIPSTATUS
01:04:07.556: Cmd 142, is_ok: 0, Rx: OK
01:04:07.560: Cmd 142, is_ok: 0, Rx: STATE: IP STATUS
01:04:07.564: Cmd 142, is_ok: 0, Rx: C: 0,,"","","","INITIAL"
01:04:07.570: Cmd 142, is_ok: 0, Rx: C: 1,,"","","","INITIAL"
01:04:07.575: Cmd 142, is_ok: 0, Rx: C: 2,,"","","","INITIAL"
01:04:07.581: Cmd 142, is_ok: 0, Rx: C: 3,,"","","","INITIAL"
01:04:07.587: Cmd 142, is_ok: 0, Rx: C: 4,,"","","","INITIAL"
01:04:07.592: Cmd 142, is_ok: 1, Rx: C: 5,0,"TCP","148.251.189.186","1883","CLOSED"
01:04:07.601: Cmd 161, is_ok: 0, Rx: C: 6,,"","","","INITIAL"
01:04:07.605: Cmd 161, is_ok: 0, Rx: C: 7,,"","","","INITIAL"
01:04:07.611: Cmd 161, is_ok: 0, Rx: AT+CIPSSL=0
01:04:07.615: Cmd 161, is_ok: 0, Rx: ERROR
01:04:07.629: Cmd 132, is_ok: 0, Rx: AT+CIPSTART=5,"TCP","148.251.189.186",1883
01:04:07.629: Cmd 132, is_ok: 0, Rx: OK
01:04:07.993: Cmd 132, is_ok: 1, Rx: 5, CONNECT OK
01:04:07.997: Cmd 142, is_ok: 0, Rx: AT+CIPSTATUS
01:04:07.998: Cmd 142, is_ok: 0, Rx: OK
01:04:08.000: Cmd 142, is_ok: 0, Rx: STATE: IP PROCESSING
01:04:08.002: Cmd 142, is_ok: 0, Rx: C: 0,,"","","","INITIAL"
01:04:08.006: Cmd 142, is_ok: 0, Rx: C: 1,,"","","","INITIAL"
01:04:08.012: Cmd 142, is_ok: 0, Rx: C: 2,,"","","","INITIAL"
01:04:08.017: Cmd 142, is_ok: 0, Rx: C: 3,,"","","","INITIAL"
01:04:08.023: Cmd 142, is_ok: 0, Rx: C: 4,,"","","","INITIAL"
01:04:08.029: Cmd 142, is_ok: 1, Rx: C: 5,0,"TCP","148.251.189.186","1883","CONNECTED"
01:04:08.036: [LWGSM MQTT] Writing packet type CONNECT to output buffer
01:04:08.043: Cmd 0, is_ok: 0, Rx: C: 6,,"","","","INITIAL"
01:04:08.048: Cmd 0, is_ok: 0, Rx: C: 7,,"","","","INITIAL"
01:04:08.058: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,40
01:04:08.412: Cmd 133, is_ok: 0, Rx: test
01:04:08.413: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:04:08.474: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,4:
01:04:08.474: [LWGSM MQTT] Parser init state, received first byte of packet 0x20
01:04:08.475: [LWGSM MQTT] Remaining length received: 2 bytes
01:04:08.480: [LWGSM MQTT] Processing packet type CONNACK
01:04:08.486: [LWGSM MQTT] CONNACK received with result: 0
01:04:08.491: [MQTT API] Connect event with status: 0
I (3765861) mqtt_client_api: Connected and accepted!
I (3765861) mqtt_client_api: Client is ready to subscribe and publish to new messages
01:04:08.510: [LWGSM MQTT] Writing packet type PUBLISH to output buffer
01:04:08.516: [LWGSM MQTT] Pkt publish start. QoS: 1, pkt_id: 62
01:04:08.527: Cmd 133, is_ok: 0, Rx: AT+CIPSEND=5,41
01:04:08.815: Cmd 133, is_ok: 1, Rx: 5, SEND OK
01:04:08.816: Cmd NULL, is_ok: 0, Rx: +RECEIVE,5,4:
01:04:08.817: [LWGSM MQTT] Parser init state, received first byte of packet 0x40
01:04:08.821: [LWGSM MQTT] Remaining length received: 2 bytes
01:04:08.826: [LWGSM MQTT] Processing packet type PUBACK
01:04:08.831: [MQTT API] Publish event with response: 0
I (3766201) mqtt_client_api: Published data {1, 1, 0.000000, 0.000000, 0.000000, 49845}

@MaJerle
Copy link
Owner

MaJerle commented Jul 22, 2023

Sorry for the delay. Quite some challenges these days.

Have you been able to find out the root cause or still not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants