Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for Dead Agents #365

Open
ryanstwrt opened this issue Oct 14, 2021 · 2 comments
Open

Checking for Dead Agents #365

ryanstwrt opened this issue Oct 14, 2021 · 2 comments

Comments

@ryanstwrt
Copy link

I have a centralized agent who is continually checking to see if other agents are still running. I am currently lopping through a dictionary list I created when each agent was initialized and grabbing each agent using self._proxy_server.proxy(agent). Where self._proxy_server is proxy.NSProxy(). Once I have an agent I use ka.get_attr('_running') to determine if it is running. This has worked in the past when I have less than 100 agents, however, I am finding that I am getting the following error:

Pyro4.errors.CommunicationError: cannot connect to ('localhost', 43296): [Errno 111] Connection refused)

This error is triggered on self._proxy_server.proxy(agent). Is there a better way to determine if agents have failed somehow? On a side note, I don't have a simple reproducible example; I apologize, however, I've had no luck reproducing it in a smaller scale. Thank you!

@Peque
Copy link
Member

Peque commented Oct 15, 2021

It would be great to have a reproducible use case. Even if it was with 100 agents, having a piece of code to reproduce the issue (and add a test) would be very helpful. 😊

Maybe you want to have a look at ØMQ - The Guide. There are many communication patterns explained. Maybe you want to look for heartbeating (example 1, example 2).

It would be great to integrate more and updated communication patterns into osBrain, and The Guide is a great place to look for them. 😜

@ryanstwrt
Copy link
Author

Thanks @Peque! I'll take a look at The Guide and see what I can find. I'm still trying to create a test case for this to isolate the problem. If I figure it out, I'll post it here. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants