You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EdgeAgent should ALWAYS keep all modules running regardless of edgeAgent operation.
Current Behavior
If EdgeAgent is busy downloading new version of modules it will not ensure existing modules are running.
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Slow internet setup
Deploy new version deployment manifest
Ensure EdgeAgent struggles downloading new versions (100 second timeout) -> repeats for a long time
Crash one of the existing modules with fatal exception or similar
Observe that EdgeAgent does not restart the crashing module
Context (Environment)
Deployments operating on slow satellite internet connection with high ping.
Deployment manifest has:
"restartPolicy": "always",
"status": "running",
ModuleUpdateMode: WaitForAllPulls
Output of iotedge check
Click here
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
× config.yaml has correct hostname - Error
config.yaml has hostname PC2FD48A but device reports hostname pc2fd48a.
Hostname in config.yaml must either be identical to the device hostname or be a fully-qualified domain name that has the device hostname as the first component.
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: certificates - Warning
The Edge device is using self-signed automatically-generated development certificates.
They will expire in 204 days (at 2024-10-26 11:29:55 UTC) causing module-to-module and downstream device communication to fail on an active deployment.
After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs.
Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
Data might be lost if the module is deleted or updated.
Please see https://aka.ms/iotedge-storage-host for best practices.
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning
The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.
Data might be lost if the module is deleted or updated.
Please see https://aka.ms/iotedge-storage-host for best practices.
Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK
17 check(s) succeeded.
4 check(s) raised warnings. Re-run with --verbose for more details.
1 check(s) raised errors. Re-run with --verbose for more details.
Device Information
Host OS [e.g. Ubuntu 22.04, Windows Server IoT 2019]:
Architecture [e.g. amd64, arm32, arm64]:
Container OS [e.g. Linux containers, Windows containers]:
Hey @mathiash98 thanks for reporting this. It looks like an existing problem where the pull operation happens synchronously and prevents EdgeAgent reacting to other module lifecycle events. I'll bring it up with the team to see how we can fix that. Meanwhile I can suggest a workaround (if applicable/possible) is to ssh to the device and manually pre-pull (docker pull) the images before applying the deployment. Also you can set imagePullPolicy: Never for that.
Just a side note about ModuleUpdateMode: WaitForAllPulls. It means that all images will be downloaded first, and then containers (re)started, but EA still makes pull synchronously.
Hey @mathiash98 thanks for reporting this. It looks like an existing problem where the pull operation happens synchronously and prevents EdgeAgent reacting to other module lifecycle events. I'll bring it up with the team to see how we can fix that. Meanwhile I can suggest a workaround (if applicable/possible) is to ssh to the device and manually pre-pull (docker pull) the images before applying the deployment. Also you can set imagePullPolicy: Never for that.
Just a side note about ModuleUpdateMode: WaitForAllPulls. It means that all images will be downloaded first, and then containers (re)started, but EA still makes pull synchronously.
Thanks for the information, we will try to keep an eye out for the installation with poor internet connection. Manually polling on 100 devices is too cumbersome, but we will create some dashboard that will show the devices that IotEdge is reporting as failing to mitigate this
Expected Behavior
EdgeAgent should ALWAYS keep all modules running regardless of edgeAgent operation.
Current Behavior
If EdgeAgent is busy downloading new version of modules it will not ensure existing modules are running.
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Context (Environment)
Deployments operating on slow satellite internet connection with high ping.
Deployment manifest has:
"restartPolicy": "always",
"status": "running",
ModuleUpdateMode: WaitForAllPulls
Output of
iotedge check
Click here
Device Information
Runtime Versions
iotedge version
]: 1.1.15docker version
]: 20.10.20+azure-1Logs
edge-agent logs
The text was updated successfully, but these errors were encountered: