Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network.Transport.Management #21

Open
mboes opened this issue Jun 22, 2015 · 4 comments
Open

Network.Transport.Management #21

mboes opened this issue Jun 22, 2015 · 4 comments
Labels

Comments

@mboes
Copy link
Contributor

mboes commented Jun 22, 2015

From @hyperthunk on December 18, 2012 13:48

With any external resource, chances are things are going to go wrong sometimes. Following on from issue #98 there are various situations which might require manual intervention. Shutting down a bundle of connections/channels between two specific endpoints, forcibly terminating an endpoint or even shutting down a whole transport might be required if connectivity drops in such a way that things get stuck.

Having an API for this would enable us to write tools (command line, web based, etc) to assist with manual administration of a distributed system. The API needs to support the primary use case where an administrator connects to a running node from an external location and can take actions from there.

There needs to be some entry point to which the external client connects, and of course there is no such concept in Network.Transport. I can see two ways to go about this, though there may be other possibilities too.

  1. couple the functionality with the node controller
  2. provide a service registry as part of Network.Transport itself

The point here is that in any running executable, we need some means by which we can connect in order to query for this information. As the node controller already provides this, it seems a sensible choice at first glance. The node controller is initialised with a Transport so it can use the Network.Transport APIs to handle requested interactions.

So does it make sense to force all the interactions to go through the node controller? Another alternative would be to have a registered service process that gets booted with each node controller, and use nsend to talk to this process instead. Either way, the API data needs to reside in core CH so that the nodes can communicate effectively without sharing the same image.

Providing some kind of service registry for the Network.Transport itself is probably wrong. We'd need to provide an access point to the outside world and it seems crazy not to use the node controller(s) for this. I suppose one way of doing this would be to have the backends open up an additional management port and use a separate control channel for management messages - not sure what I think about forcing that on all backends though, and as @edsko mentioned elsewhere we're trying to keep actual functionality out of the Network.Transport layer and push it to the implementations. Forcing each implementation to write code to handle management requests seems wrong.

One problem with using the node controller as the entry point for a management (and/or stats gathering) API is that you need to know which backend is in use. As an administrator I guess you should know that anyway, so maybe it's not a problem.

I also think that we should put a secure HTTP based API around this, so that you can open up the management capabilities without having to make connectivity possible. For example, you might not want to expose the node outside your LAN, but allow administration to take place over the internet providing TLS is in play. That probably belongs either in a separate top-level project, or in -platform, possibly bundled with other functionality into a single management web interface.

Copied from original issue: haskell-distributed/distributed-process#99

@mboes
Copy link
Contributor Author

mboes commented Jun 22, 2015

From @robstewart57 on December 18, 2012 14:31

I'm not sure if this sits in this issue...

I'd like to further this by suggesting that some autonomous management should be done within the transport layer, to deal with massive scalability. As I discovered [http://www.haskell.org/pipermail/haskell-cafe/2012-February/099212.html] , there are operating system limits on the number of open TCP connections --- by default 1024. If we are thinking at scales of 1,000's of nodes, then connecting every endpoint to every other endpoint is problematic.

There is a case to say that this concern is not one for the programmer to deal with. A solution would be have the transport on each node manage the connections on each of its endpoints. Heuristics might include killing heavyweight connections to remote endpoints that haven't been used in a time limit, killing old connections when new connection requests are made and a connection limit (e.g. ulimit) isn't far from being breached etc..

I'm not sure whether this is a parameter set when creating transports or endpoints, but such connection management should probably not be a concern of the programmer.. (?)

@mboes
Copy link
Contributor Author

mboes commented Jun 22, 2015

From @hyperthunk on December 18, 2012 16:5

Hi @robstewart57 - first of all, thanks for taking an interest in Cloud Haskell. It's very important that we build a strong community around CH and getting feedback as well as community participation is vitally important to us.

I'm not sure if this sits in this issue...

Indeed I think this issue deserves a separate thread all of its own. I'm going to address your comments and start the discussion going over in issue #101. Please do come and join in over there!

@mboes
Copy link
Contributor Author

mboes commented Jun 22, 2015

From @edsko on January 7, 2013 18:41

Without having thought about this too hard (so I could be wrong) I would recommend to do as much as you can at the level of CH and as little as possible at the level of NT. Transports are tricky to implement and should be kept as focused as possible.

@mboes
Copy link
Contributor Author

mboes commented Jun 22, 2015

From @hyperthunk on January 7, 2013 19:1

Yes I agree with that in principle. I suspect we can do all of this in the node controller or with a service process next door to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants