Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoT Edge produces too much traffic with default configuration, it is not conducive for billable networks such as cellular. #7252

Open
uriel-kluk opened this issue Mar 27, 2024 · 11 comments
Assignees

Comments

@uriel-kluk
Copy link

uriel-kluk commented Mar 27, 2024

Hello IoT Edge team,

I hope you can help us and the community to fine-tune our iot edge design for billable networks.
To give you perspective, our IoT device sends compressed and batched telemetry on periodicity and on event changes.
This application, running stand-alone using MQTT and device SDK consumes less than 1MB a day.

We had good reasons to add containers and more features, and ported the application to a Linux constrained device, running IoT Edge with a cellular modem. but now the bill from telco is 12x

After some exploration, we identified some possible optimizations, but we would love to get best practices and recommendations from you:

  1. Amqp appears to be the recommended protocol in documentation because $upstream multiplexing capability. But the ModuleClient in the custom modules runs thinner using Mqtt. Is there a comparison to illustrate when to mix and match Transport protocols? And if we use marketplace modules, is there a way to define the channel or at least, read it?

  2. Could you help figure out what variable controls the following traces and how these affect traffic:
    2.1. Done refreshing device scope identities cache. Waiting for 60 minutes.
    2.2. Entering periodic task to reauthenticate connected clients. //this shows every 5 minutes
    2.3. Started task to cleanup processed and stale messages for endpoint test_

  3. I am surprised to see messages like the following, but this might be our code, and unrelated to traffic
    Cleaned up 516 messages from queue for endpoint test_<route> and 516 messages from message store.

  4. Could you explain how ConfigRefreshFrequencySecs works? why is it set by default every hour if the edge agent subscribes for property changes?

  5. What is the advantage to set CacheTokens to true?

  6. Do you have any other advice to reduce downlink traffic if the application is mainly a telemetry pump?

Here are more configuration variables
`"edgeAgent": {
"env": {
"SendRuntimeQualityTelemetry": {
"value": false
},
"ConfigRefreshFrequencySecs": {
"value": 86400
},
"DisableDeviceAnalyticsMetadata": {
"value": true
},
"MetricsEnabled": {
"value": false
},
"UpstreamProtocol": {
"value": "Mqtt"
}

"edgeHub": {
"env": {
"OptimizeForPerformance": {
"value": false
},
"HttpSettings__Enabled": {
"value": false
},
"CloudConnectionIdleTimeoutSecs": {
"value": 7200
},
"ConfigRefreshFrequencySecs": {
"value": 86400
},
"RuntimeLogLevel": {
"value": "info"
},
"CacheTokens": {
"value": true
},
"UpstreamProtocol": {
"value": "Mqtt"
},
"AmqpSettings__Enabled": {
"value": false
}
`

@nyanzebra nyanzebra self-assigned this Mar 27, 2024
@nyanzebra
Copy link
Contributor

@varunpuranik / @veyalla do we have any customer facing documentation on the performance breakdowns of mqtt vs amqp?

  1. This might be marketplace module specific, do you have a specific set of modules in mind? For example, the SimulatedTemperatureSensor can have a custom route defined.
  2. Believe those messages correspond to device token refreshes for connections, are you saying you want control over that frequency?
  3. The queue cleanup should be after messages are sent and doing a batch cleanup of messages from underlying store. Can double check this, but think this happens every so often to clean up state from the underlying database.

For 4,5,6 @vipeller or @varunpuranik any guidance on these?

@uriel-kluk
Copy link
Author

Thanks @nyanzebra,

Believe those messages correspond to device token refreshes for connections, are you saying you want control over that frequency?

My only motivation to ask about token refreshnes is to reduce traffic. I see a periodic trace every 5 minutes reauthenticating connected clients. I hope this is not creating cellular traffic.

But the main question remains the same, what is causing downloads, if the application is just pumping telemetry?
image

@nyanzebra
Copy link
Contributor

@uriel-kluk is the graph for network traffic to/from device? There should be occasional twin data sent to device to see if any changes in modules need to happen as well as some connectivity checks, but would be surprised if it is significant. I think the next steps are to dig deeper.

If you have metrics it will be interesting to see what the following say:

  • edgehub_gettwin_total
  • edgeAgent_total_network_in_bytes

Additionally, if you can provide a support bundle that will be useful.

@uriel-kluk
Copy link
Author

uriel-kluk commented Mar 28, 2024 via email

@uriel-kluk
Copy link
Author

Hi @nyanzebra ,

Attached support bundle.

Please look at this chart:

image

It is clear that the edgeHub is renogotiating credentials every hour. Is there a way to control the frequency?
Identity attestation uses x.509 in this illustration.

Also, the custom module is doing authentication every hour. this is less impactful but yet it creates traffic.
This is the code I am using to connect the custom module with the edgeHub:
edge-device-support-bundle.tar.gz

ITransportSettings[] GetTransportSettings()
            {
                switch (transportType)
                {
                    case TransportType.Mqtt:
                    case TransportType.Mqtt_Tcp_Only:
                        return new ITransportSettings[] { new MqttTransportSettings(TransportType.Mqtt_Tcp_Only) };
                    case TransportType.Mqtt_WebSocket_Only:
                        return new ITransportSettings[] { new MqttTransportSettings(TransportType.Mqtt_WebSocket_Only) };
                    case TransportType.Amqp_WebSocket_Only:
                        return new ITransportSettings[] { new AmqpTransportSettings(TransportType.Amqp_WebSocket_Only) };
                    default:
                        return new ITransportSettings[] { new AmqpTransportSettings(TransportType.Amqp_Tcp_Only) };
                }
            }

            var settings = GetTransportSettings();
            var options = new ClientOptions
            {
                ModelId = ModelId,
                SasTokenTimeToLive = TimeSpan.FromDays(1),
            };

            // OpenAsync a connection to the Edge runtime
            _moduleClient = await ModuleClient.CreateFromEnvironmentAsync(settings, options);

@nyanzebra
Copy link
Contributor

@uriel-kluk would you mind trying to set DeviceScopeCacheRefreshRateSecs to a much larger number (default is 3600s or 1hr). I am hoping this reduces the download quite a bit, it also shouldn't effect edgeHub really.

@uriel-kluk
Copy link
Author

Thank you @nyanzebra, I tried almost every parameter, but I am getting old and lazy, and because I saw the leaf description, I dismissed. Fingers crossed, this solves the issue. I should be able to provide an update in about an hour :)

Last few days with the default one hour:
image

@david-emakenemi
Copy link

@uriel-kluk did the solution that Robert provided solve the issue?

@claya75
Copy link

claya75 commented Apr 8, 2024

@david-emakenemi @nyanzebra Unfortunately, no. Thanks for all the great input on keeping our IoT Edge traffic lean on cellular. I've been digging into this with @uriel-kluk and with Wireshark I found some spots where we're using more bandwidth than we'd like or expect. I've attached some captures and charts to help illustrate.

We would love to get your take on understanding why we'd be getting so much Rx traffic on edgeHub as shown below, and/or how to reduce it.
2024-04-08 12 14 44

This screenshot is of a Wireshark capture with all Tx traffic filtered out, so this is only Rx traffic coming into our edge device, viewed both in bytes downloaded over time and the delta time between packets over time.
Rx only--bytes and delta time

This is a typical 24 hour period reported with our SuperSIM from Twilio.
image

Capture files (.pcapng) both for 2.5hrs, one with TCP and TLS only filtered.
captures.zip

@nyanzebra
Copy link
Contributor

@claya75 & @uriel-kluk

Looking at the attached pcap, it looks like packets regarding certs are few and while large, aren't the majority of downloads.

When looking further it looks like there is a consistent traffic being sent from 20.49.109.144, which I assume to be your iothub, of about 135 length (TLS record len 64). Given the size, these are likely not PUBACKs within MQTT.

I will try to reproduce by setting up a local device and see if can monitor traffic w/o TLS to see if can get some answers. Will also ask IoTHub team if they have any ideas.

@uriel-kluk
Copy link
Author

uriel-kluk commented Apr 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants