New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to communicate with Edge modules (in EFLOW) from the host OS #7220
Comments
@bhjertaas would you mind providing a support bundle? The command is From what I understand, there is no issue retrieving the metrics via curl from EFLOW vm side, but when trying from host machine side it does not work. Is this correct? Additionally, is this a recent issue? Or something that has been happening? |
Sorry for the delay, here is a support bundle. support_bundle_2024_02_25_14_47_17_UTC.zip You asked: From what I understand, there is no issue retrieving the metrics via curl from EFLOW vm side, but when trying from host machine side it does not work. Is this correct? ==> Yes correct. This has been happening for a while. We have, as mentioned, 7 test-PCs that are set to reboot and we've been seeing this problem quite regularly. If you setup a PC with EFLOW, that reboots every 2 hours say, you should be able to reproduce this behaviour within two days I estimate. (of course you need something on Windows that makes HTTP calls to a module endpoint to detect when it happens). According to our health-reporting system, non-communication problem to this particular test-PC-6, started Feb 25th 12:15 CET. The IP at the time was 172.18.164.255 as provided by get-eflowVmAddr.
I've had a look at the bundle and can't find much. It is unfortunate that logs for the modules themselves are "too large". See my other issue on that. But I doubt this networking problem is caused by module wrongdoing anyway. |
@jagadishmurugan while edge team is investigating the support bundle is there anything to recommend to @bhjertaas to validate EFLOW to host networking. Just want to make sure that all looks correct. |
@jagadishmurugan @nyanzebra Can you follow up and share your finding? |
@jagadishmurugan @nyanzebra Any updates on this? |
@bhjertaas sorry for delay, this got lost amongst other issues I have been looking at. The support bundle provided seems to have failed to get logs |
I have tried to make a support-bundle with smaller time frame, but the "log message is too large" problem remains. Until the underlying Moby bug is fixed, the recommended Local logging driver is not a viable option. Which log driver do you recommend instead on Edge devices? I'm just thinking that perhaps we will have to sort that out first, before we can get clarity on this issue. |
Expected Behavior
We should be able to communicate with Edge modules from the host OS
Current Behavior
Sometimes, after reboot of the PC, communication from host OS to Edge modules does not work. We can
Connect-EflowVm
andGet-EflowVmAddr
and things, but making HTTP calls to the Edge modules within Eflow doesn't work. The requests simply time out. We don't have much exceptions and logs to provide unfortunately, but here is the output from a Powershell on WindowsThe following image shows the problem over time. If the PC is unable to make simple HTTP requests the graph drops to zero. The test-PCs are setup to restart every 3 hours.
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Get-EflowVmAddr
and copy the IP address that comes outcurl http://172.18.92.180:9602/metrics
which in our case gets metric data from edgeHub because we have mapped the internal 9600 port to 9602. But it doesn't matter what you request, the request won't work.
Context (Environment)
Output of
iotedge check
Click here
Device Information
Runtime Versions
iotedge version
]: 1.4.20docker version
]: 20.10.27 (iotedge_moby_engine version -> 24.0.6)Note: when using Windows containers on Windows, run
docker -H npipe:////./pipe/iotedge_moby_engine version
insteadAdditional Information
When we do encounter this issue, it remains a problem until the PC is restarted.
sudo iptables -P input accept
we're able to ping the EFLOW VM from the host OS.hcsdiag console <guid>
we can confirm that from that shell, we can both ping the IP of the VM and do curl requests to edge module endpoints that did not work from the host OSWe have been in contact with the Iotedge-eflow team, and one member of that team have had a session with us on one of the test PCs that did not work. He believes the problem has to do with the IoT-Edge part, which is why I post this issue here.
The text was updated successfully, but these errors were encountered: