New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct Method of IoTEdge module takes 3 minutes to start responding #7251
Comments
@nyanzebra - Any ideas about this subject? |
@wvangeem would you mind seeing if you can add a log message in your module to see if the message ever arrives in the initial timeout window? For example from edgeHub logs: <4> 2024-03-27 12:38:41.430 +00:00 [WRN] - Did not receive response for method invoke call from device/module 787052036B1B/SensorController for 787052036B1B/SensorController with correlation ID 6b7aa24a-202d-4ae4-8394-a7df05d04360 There is a correlation ID, I need to double check if it is possible to print this in the module when handling direct method calls, but hoping it will be printed. Basically I want to make sure that the problem is either the message is not forwarded to module when it should be or the module received the message and was stuck or finally, edgeHub didn't receive the response for some reason. |
Hi @nyanzebra I also fixed edgeAgent and edgeHub versions to 1.4.33 But now I cannot simulate it anymore when I reboot the device most of the times. I restarted the module on 14:29:43 This is the log of the APIServer, which calls both StateAnalyzer and SensorController.
This is the log of the SensorController.
I also sent the Connection State Events to an Eventhub, and added some logging.
Back to the log of APIServer, you also see here that the SensorController starts accepting requests at 14:32:48, 3 minutes later
I also added edgeHub logs (complete log in link) Stop of SensorController at 14:29:42
Start of SensorController at 14:29:46
Failing requests from this moment on:
Re-initialization of module by edgeHub 3 minutes later, which causes it to work again.
Best regards. |
@wvangeem just to update, I am still investigating the code and logs and will post when have an update. |
Hi @wvangeem sorry for the delay, it is also strange there is another setup in which the situation of the modules is reversed... Would you mind setting the |
Hi @nyanzebra But the StateAnalyzer service needs to send a message to IoTHub with following code:
The code has a variable "debugtime", which is the date the message was sent, and a message "VEHICLESTATE SENT" when the SentEventAsync function has finished. I started the device at 06:01 - so all the attached logs are having this dates from at least that time At 06:01:42 - APIServer is sending a message to IoTHub This same action also calls the GRPC Service on StateAnalyzer, which executes the code above: But this seems to take a long time. But in the edgeHub log, you see that edgeHub received the message: And this goes on for 2 minutes, and suddenly, after 2 minutes, this shows up in the StateAnalyzer log
you see that all the From that time on, all GRPC calls are executed immediately, so all SentEventAsync calls are executed immediately. When you check the edgeHub logs at that time, you see that the StateAnalyzer module is "reconnected".
So I guess it's not directly related to Direct Methods, but the edgeHub has some strange behaviour. I attached the full logs here: Best regards, Wim |
@wvangeem would you mind doing a small test? https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setminthreads?view=net-8.0 would see if setting the minimum thread pool to say 100 helps? Want to rule out starvation as a cause. |
Hi @nyanzebra, I added following piece of code to every container:
Every container reports this in the startup log I still have exactly the same behaviour afterwards. All threads hang in the StateAnalyzer module, and 2 minutes after startup (after reboot), all the threads are finished. Best regards, |
Current Behaviour
I have 3 IoTEdge modules (C# SDK)
Every second, the APIServer module calls a DirectMethod in both StateAnalyzer and SensorController
When I startup the device, the StateAnalyzer module almost immediately responds to the DirectMethod call.
The SensorController always timeouts exactly 3 minutes.
After these 3 minutes, it starts to respond to the DirectMethod call.
When I manually restart the SensorController module (iotedge restart SensorController), I have exactly the same behaviour.
It takes exactly 3 minutes.
When I manually restart the StateAnalyzer module, it takes only 6 seconds for the DirectMethod call to respond.
I have the full edgeHub and edgeAgent logs below, but here is an extract with my actions.
You will always see that the DirectMethods fail, that the SensorController connection is restarted twice by edgeHub, and that initially after 3 minutes it is started again after which it starts to work.
Upon startup
Manual Restart StateAnalyzer
Manual Restart SensorController
Manual Restart StateAnalyzer in between - SensorController is still not working
After a while: Automatic restart of SensorController by eventHub
Another test with manual restart SensorController - again 3 minutes
Context (Environment)
Both modules are developped in C#, and have the same way of connecting (AMQP & MQTT)
Output of
iotedge check
Click here
Device Information
Runtime Versions
iotedge version
]: 1.4.10docker version
]: 20.10.25+azure-1Logs
Links are added for the logs
edgeagent-2024032702.log
edgehub-2024032702.log
The text was updated successfully, but these errors were encountered: