Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: QoS 2 behavior for an unstable network #1950

Open
kushwiz opened this issue Feb 25, 2022 · 9 comments
Open

Question: QoS 2 behavior for an unstable network #1950

kushwiz opened this issue Feb 25, 2022 · 9 comments

Comments

@kushwiz
Copy link

kushwiz commented Feb 25, 2022

Hi,

I am looking for some insights on how VerneMQ behaves for QoS 2 messages when the handshake is incomplete due to an unstable network. We see that the messages are duplicated when connection drops and re-establishes. Is this the expected behavior?

Below are logs from one of our MQTT clients and it shows different stages for a QoS 2 transaction:

20220225 053943.241 3772 <client_id> <- PUBLISH msgid: 24 qos: 2 retained: 0 payload len(932):
{...first QoS 2 message ...}

2022-02-25T05:39:43.241Z DEBUG (0454-161C) <5660> [mqtt] OnMessage(): Incoming message: {...first QoS 2 message ...}

20220225 053943.241 3772 <client_id> -> PUBREC msgid: 24 (0)

20220225 053946.042 3772 <client_id> -> DISCONNECT (-1)

2022-02-25T05:39:46.041Z DEBUG (0454-161C) <5660> [mqtt_process] State: MQTT_STATE_NOT_CONNECTED
2022-02-25T05:39:46.041Z INFO (0454-161C) <5660> [mqtt_process] OnConnectionLost(): Connection lost: (null)

2022-02-25T05:39:47.483Z INFO (0454-161C) <5660> [mqtt_process] OnReconnected(): Reconnected to MQTT broker
2022-02-25T05:39:47.483Z DEBUG (0454-161C) <5660> [mqtt_process] SetState(): State: MQTT_STATE_CONNECTED

20220225 053947.557 6284 <client_id> <- PUBLISH msgid: 12 qos: 2 retained: 0 payload len(932): {..duplicate QoS 2 message..}

20220225 053947.558 6284 <client_id> -> PUBREC msgid: 12 (0)

2022-02-25T05:39:47.558Z DEBUG (0454-161C) <5660> [mqtt_process] OnMessage(): Incoming message: {..duplicate QoS 2 message..}

20220225 053947.861 6284 <client_id> <- PUBREL msgid 24
20220225 053947.861 6284 <client_id> -> PUBCOMP msgid 24 (0)

@ioolkos
Copy link
Contributor

ioolkos commented Feb 25, 2022

@kushwiz, an MQTT Publisher (in your example, the broker) will only resend a message when it does not receive a Pubrec. If the Publisher has received a Pubrec, it should not resend a message.

Is your question about Msgid 24, or Msgid 12 in your log? It seems we see a full correct handshake for Msgid 24.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@nirmitdesai
Copy link

@ioolkos From the logs, MQTT Publisher (MQTT broker) published a message with msg id 24 :
20220225 053943.241 3772 <client_id> <- PUBLISH msgid: 24 qos: 2 retained: 0 payload len(932):
{...first QoS 2 message ...}

The subscriber sent PUBREC to broker and I presume broker would have received it:

20220225 053943.241 3772 <client_id> -> PUBREC msgid: 24 (0)

Then MQTT connection was disconnected and reconnected. Later, MQTT broker published same message again but with different msg id : 12

20220225 053947.557 6284 <client_id> <- PUBLISH msgid: 12 qos: 2 retained: 0 payload len(932): {..duplicate QoS 2 message..}

So is this duplicate message, with different msgID expected?

@kushwiz
Copy link
Author

kushwiz commented Feb 25, 2022

@ioolkos we published the message only once to VerneMQ (1.12.3)

Is there a possibility that VerneMQ could have duplicated the message due to cluster inter-node communication retries? Hence duplicate messages are seen with different message ids?

@ioolkos
Copy link
Contributor

ioolkos commented Feb 25, 2022

@nirmitdesai No, that's not expected.

  • Can you reproduce this in a fully isolated test?
  • Are initial publisher and consumer connected to the same node or different nodes?
  • is the dup flag on that msgid 12 set?
  • how do you determine it is the same message exactly?
  • What version of VerneMQ do you use?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos
Copy link
Contributor

ioolkos commented Feb 25, 2022

@kushwiz you answered my questions quicker than I can ask :)
Hm, it must be internode communication then.

Seems the issue is also described here (not the opener's issue but the later comments): #944
Need to investigate and solve once for good.

EDIT: Especially the weirdness of a different MsgID. Inter-node delivery is not an MQTT session.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@nirmitdesai
Copy link

@ioolkos , yes, the 2nd message had "dup" flag set to 1. I determined the message is same just by their contents.

@nirmitdesai
Copy link

hi @ioolkos, I see that issue 944 was opened in 2018, some fixes made and then other folks have also run into duplicate message problem with cross node publishing around 2021. Is there any estimate on when this could be fixed? Without the fix, using QoS 2 with VerneMQ is a problem since we receive duplicates, which we hoped QoS 2 would avoid in the first place.

@ioolkos
Copy link
Contributor

ioolkos commented Mar 2, 2022

@nirmitdesai I'll work on an explanation on how this currently works, how it's related to the question of sync or async, various timeouts, buffers, settings, PR #1769 etc.
Remote enqueuing is a somewhat complex topic, it'll take me some time to break it down & lay out your current options.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@nirmitdesai
Copy link

Okay looking forward to it @ioolkos . Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants