Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT client not receiving messages #5282

Closed
fly2F opened this issue Apr 8, 2024 · 13 comments
Closed

MQTT client not receiving messages #5282

fly2F opened this issue Apr 8, 2024 · 13 comments
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@fly2F
Copy link

fly2F commented Apr 8, 2024

Observed behavior

I am having trouble receiving messages with my MQTT client.

Check the durable consumer. The Consumer sequence in Delivered Message is increasing, but ack is always 0.

Workarounds:

  • Set CleanSession to true and then false. This will rebuild the session and the client will start receiving messages again.
  • Delete the corresponding session under $MQTT_sess on the server and then restart the client. This will also rebuild the session and the client will start receiving messages again.
  • Restart the server.

Expected behavior

The client should receive the message.

Server and client version

Server: Nats 2.10.7, 1 Node
Client: Java org.eclipse.paho.client.mqttv3

Host environment

Host: Amazon Linux 2, 5.10.184-175.749.amzn2.x86_64
Docker: 20.10.25

Steps to reproduce

  1. Connect the MQTT client to the server.
  2. Check the connection status with connz. The connection status is normal.
  3. Check the connz sub list. It contains a topic and a delivery subject starting with $MQTT.sub.
  4. Check the connz delivery subject. It is consistent with the records in $MQTT.sess and the configuration in the corresponding durable consumer.
  5. Check the out_bytes and out_msg with connz. They are increasing.
  6. Set the log level to Trace. You can see the mqtt client going online and starting the subscription, but you can't see the client publish records.
  7. Check the durable consumer. The Consumer sequence in Delivered Message is increasing, but ack is always 0.
@fly2F fly2F added the defect Suspected defect such as a bug or regression label Apr 8, 2024
@derekcollison
Copy link
Member

If you could, please make sure you are running the latest server version, which currently is 2.10.12.

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

If you could, please make sure you are running the latest server version, which currently is 2.10.12.

Thank you very much for your reply!

I have previously upgraded the NATS version to 2.10.12, and I have not reproduced the issue on version 2.10.12 and after restarting to version 2.10.7.

I have not found a way to trigger the issue.

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

This is the detail of the consumer I have problems with. The Consumer sequence in this consumer is strange

❯ nats consumer info '$MQTT_msgs' 'HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3'
Information for Consumer $MQTT_msgs > HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3 created 2024-03-29T12:47:12+08:00

Configuration:

            Durable Name: HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3
        Delivery Subject: $MQTT.sub.UKKHlQPcxPpTI6R1YNqdst
          Filter Subject: $MQTT.msgs.localserver.shopId.78880634
          Deliver Policy: New
              Ack Policy: Explicit
                Ack Wait: 30.00s
           Replay Policy: Instant
         Max Ack Pending: 1,024
            Flow Control: false

State:

  Last Delivered Message: Consumer sequence: 541,868 Stream sequence: 563 Last delivery: 9.04s ago
    Acknowledgment Floor: Consumer sequence: 0 Stream sequence: 0
        Outstanding Acks: 42 out of maximum 1,024
    Redelivered Messages: 42
    Unprocessed Messages: 0
         Active Interest: Active
(base) ~ on ☁️  (us-west-2)

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

Here is the relevant connz information. out_bytes and out_msg are increasing.

image

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

I noticed this issue #4736. Although I am using mqtt, my consumer behavior is very similar to the description in this comment #4736 (comment)
.
It is possible that the two issues are related.

@levb
Copy link
Contributor

levb commented Apr 8, 2024

@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)

Are the messages you mention QOS1 or QOS2 MQTT messages?

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)

Are the messages you mention QOS1 or QOS2 MQTT messages?

Thank you for your reply.

I am using MQTT QOS 1 and CleanSession is false.

Some of our MQTT clients reported the issue almost on the same day.

They can receive messages in the morning, but not in the afternoon.

We turned on the Trace level log and observed that the server did not send PUBLISH to the MQTT client, but the out_msgs in connz increased.

@fly2F
Copy link
Author

fly2F commented Apr 8, 2024

@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)

Are the messages you mention QOS1 or QOS2 MQTT messages?

I have also observed that the Consumer sequence in the Last Delivered Message of the consumer information is increasing by 16 every 30 seconds (Ack Wait).

These observations are made while the client is online.

@levb
Copy link
Contributor

levb commented Apr 9, 2024

@fly2F Thank you, I am trying to reproduce the issue now. Any more MQTT-related context/estimates you can provide? How many clients are there, do they re-connect often/at all prior to the issue, traffic # of messages and size? Anything "interesting" about the subscriptions? Wildcards used?

@fly2F
Copy link
Author

fly2F commented Apr 9, 2024

@levb Thank you.

I have two single-node NATS servers that experienced problems with MQTT clients not receiving messages within 2 days. These two servers are used for testing. One has about 50 clients and the other has about 30. They do not use wildcards and have very little traffic. The number of messages sent is roughly between 200 and 500. I also have three 3-node NATS clusters with more clients than these two, and they have never had this problem.

I restarted one server and upgraded the other to 2.10.12. It ran for 3-5 days and the problem did not recur. I did not find a way to reproduce it.

Since we use fixed Client IDs, there are often situations where Client IDs are repeatedly kicked offline on these two test servers. However, I specifically tested the situation of using two identical client IDs to preempt each other for two consecutive days, and it could not be reproduced. this problem.

The message size is small, about 100 bytes

@levb
Copy link
Contributor

levb commented Apr 9, 2024

Since we use fixed Client IDs, there are often situations where Client IDs are repeatedly kicked offline on these two test servers. However, I specifically tested the situation of using two identical client IDs to preempt each other for two consecutive days, and it could not be reproduced. this problem.

Thanks! We did fix a number of "mechanical" (reliability) issues is 2.10.12 relative to the the early 2.10 releases, so please monitor on your end and I will add a long-running test to reflect your use-case.

@fly2F
Copy link
Author

fly2F commented Apr 9, 2024

@levb Thanks!

The strange thing is that when the mqtt client connection cannot receive messages, out_msgs and out_bytes in connz are increasing, but the Trace level log is not included in the PUBLISH log of the client.

Can we record some mqtt metrics during PUBLISH (such as mqttEnqueuePublishMsgTo) to compare it with out_msgs metrics ?

@fly2F
Copy link
Author

fly2F commented May 5, 2024

I have upgraded the version and I cannot reproduce the problem.

@levb @derekcollison Thanks!

@fly2F fly2F closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

3 participants