Add new peers without restarting a node #32

eugene-babichenko · 2018-11-13T13:37:30Z

Proposes an extension of validator client messaging to add peers to a running node without restarting it.

Signed-off-by: Yevhenii Babichenko eugene@remme.io

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

dcmiddle · 2018-11-30T14:58:12Z

text/0000-add-peers-in-runtime.md

+This RFC proposes to implement the possibility to add new peers in runtime
+(through the component validator endpoint) when a node is working in the
+`static` peering mode. Along with that it also adds the corresponding extensions
+to the off-chain permissioning model.


I don't see the corresponding permissioning text below.
One of the main security considerations is how to prevent a bad actor from taking the validator offline. Attacks may include connecting the validator to offline or otherwise byzantine nodes. Other attacks may include the general type of exposure from adding any new remote call (input formatting etc).
The proposal adds remote access to what has so far required local system administrator access to the host. We must consider adversaries who are on the same local network. The system in most cases may be configured such that the component port is not advertised beyond the loopback, but there are valid reasons for the administrator to advertise the component port on the local network. This proposal should consider how the validator remains secure independent of the system's network configuration.

Okay, after this explanation I have a better understanding of the permissioning requirement. Will describe a model that was discussed on the chat.

dcmiddle · 2018-11-30T15:13:24Z

text/0000-add-peers-in-runtime.md

+runtime and also decreases the uptime.
+
+To resolve this problem our team proposes to add a method to add new peers to a
+running node.


Have you also considered removing nodes? The security implications are a little more risky but about the same. The benefit of including a remove function would let us address the limits of maximum-peer-connectivity.

I agree that it would be good to include removal functionality for discussion.

I think that this is a good idea.

dcmiddle · 2018-11-30T15:16:07Z

I think this would be a useful addition to the administrative APIs. Main thing to work out are security cosiderations.

aludvik · 2018-11-30T17:34:35Z

text/0000-add-peers-in-runtime.md

+receives a new request for adding peers it:
+
+- Validates the format of peer URIs which has to be
+  `tcp://IP_ADDRESS:PORT_NUMBER`


In addition to IP addresses, this should also accept DNS names, which ZMQ will resolve: http://api.zeromq.org/2-1:zmq-tcp

aludvik · 2018-11-30T17:44:30Z

text/0000-add-peers-in-runtime.md

+  `tcp://IP_ADDRESS:PORT_NUMBER`
+- If the validation was successful then the validator updates its peer list and
+  immediately returns the `OK` response. The new peers are connected _after_
+  that.


If the response is sent back before connecting, there is no way to notify clients of authorization errors. Is not sending an error due to authorization an intentional decision? Or should the error enum be extended and a response sent only after peering completes?

aludvik · 2018-11-30T17:47:19Z

text/0000-add-peers-in-runtime.md

+  internal interface of the application. Even if we do then we can restrict the
+  access to that feature by using a proxy as suggested in the documentation.
+- Should the system notify its user about the statuses of new connections or
+  leave the status check to the end user?


My opinion would be to either have the response only be sent after the new connection is created, or expose another request type that allows the user to poll on the status of the new connection so they can detect connection failures such as authorization errors.

Is not sending an error due to authorization an intentional decision

I missed it intentionally to have a discussion on how we should handle such errors.

In my opinion, it would be better to extend the enum in ClientAddPeersResponse but then I would like to discuss the implementation. The simplest approach I see here is to make the Gossip class send notifications about peer status changes. This will allow us to know if the connection attempt was ok or not ok, but will not give any specific information. To be more specific we will have to add handlers for corresponding peer messages that will notify us, for example, about authorization violations, rejected connections and so on. I think that the simplest approach should be acceptable now (at least in the system our company build).

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

…g that in batches. Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

eugene-babichenko · 2018-12-17T11:30:25Z

Just uploaded an update. Here is what is changed:

Extended the range of error codes for adding peers.
Added the description for peers removal.
Now requests for adding peers are submitted one by one instead of doing that in batches. The rationale behind that is fairly simple. Now this RFC support error codes that are returned after the connection is finished or failed. Because of that we need control over each separate connection and they may fail with different errors.
Added the description of the permissioning model. This is very basic and I doubt if it should be generalized somehow (added that to the "Unresloved questions" section).

text/0000-add-peers-in-runtime.md

agunde406 · 2019-01-04T14:48:15Z

text/0000-add-peers-in-runtime.md

+        CLIENT_PEERS_ADD_REQUEST = 131;
+        CLIENT_PEERS_ADD_RESPONSE = 132;
+        CLIENT_PEERS_REMOVE_REQUEST = 131;
+        CLIENT_PEERS_REMOVE_RESPONSE = 132;


This should be 134

agunde406 · 2019-01-04T14:49:19Z

text/0000-add-peers-in-runtime.md

+[request-processing]: #request-processing
+
+The requests are received on the `component` endpoint. When the validator
+receives a new request for adding peers it:


Please also add what the validator would do to remove peers

Added that, and there is one question: what to do if we go below the minimum peer connectivity limit here? I have two options in my mind:

Just allow to do that, but it my bring the node functionality down;

Another option is to throw an error here in case we reached the minimum limit.

Wondering which of those two will be a better solution.

Isnt this proposal only for static peering? minimum peer connectivity is used for dynamic peering.

My bad, sorry

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

vaporos · 2019-01-22T05:23:07Z

I like this proposal overall.

I wonder if this truly belongs in the Client* message namespace, or whether we should have an administrative set of messages. This is similar to the question of whether it should be exposed in the REST API, but more fundamentally whether it should be able to be bound to a different port.

eugene-babichenko · 2019-01-23T12:59:58Z

I am not sure if we should define a separate namespace here — this falls beyond the scope of this proposal. I followed the simple logic here: if something does not belong to TPs or consensus engine, this should be in the Client* namespace.

Do you mean by "to be bound to a different port" that those messages should be on a different port than component?

jsmitchell · 2019-04-01T18:36:02Z

text/0000-add-peers-in-runtime.md

+
+# Summary
+[summary]: #summary This RFC proposes to implement the possibility to add new
+peers and remove the existing connections in runtime (through the component


Recommend: "This RFC proposes functionality to add and remove static peer connections while a validator node is running."

jsmitchell · 2019-04-01T18:36:58Z

text/0000-add-peers-in-runtime.md

+[summary]: #summary This RFC proposes to implement the possibility to add new
+peers and remove the existing connections in runtime (through the component
+validator endpoint) when a node is working in the `static` peering mode. Along
+with that it also adds the corresponding extensions to the off-chain


Recommend: "This RFC also adds the corresponding extensions to the off-chain permssioning model."

jsmitchell · 2019-04-01T18:37:38Z

text/0000-add-peers-in-runtime.md

+# Motivation
+[motivation]: #motivation
+
+When an administrator adds a new node to an existing Sawtooth network he/she has


add comma after 'network' and just pick either he or she for the gender

jsmitchell · 2019-04-01T18:39:30Z

text/0000-add-peers-in-runtime.md

+
+When an administrator adds a new node to an existing Sawtooth network he/she has
+to restart a node with new peering settings. This makes any automation
+significantly harder to write than if we had the possibility to add peers in the


Recommend: "Restarting the process adds substantial complexity to infrastructure automation, and incurs system downtime."

jsmitchell · 2019-04-01T18:39:48Z

text/0000-add-peers-in-runtime.md

+significantly harder to write than if we had the possibility to add peers in the
+runtime and also decreases the uptime.
+
+To resolve this problem our team proposes to add a method to add new peers to a


Add a comma after problem

jsmitchell · 2019-04-01T18:46:58Z

text/0000-add-peers-in-runtime.md

+  list and returns the `INVALID_PEER_URI` status along with the list of faulty
+  peer URIs.
+- If the `--maximum-peer-connectivity` parameter was provided to the validator
+  then the validator checks if it has reached the maximum peer connectivity and


Add comma after connectivity

jsmitchell · 2019-04-01T18:47:21Z

text/0000-add-peers-in-runtime.md

+following:
+
+- Validates the format of peer URI which has to be `tcp://ADDRESS:PORT_NUMBER`;
+- If a peer is connected the validator removes it. Otherwise the


Add comma after connected

jsmitchell · 2019-04-01T18:47:50Z

text/0000-add-peers-in-runtime.md

+restrict access to requests that can be malicious. The workflow for the
+validation is the following:
+
+- If the `admin` role is not specified then the permissioning module will use


Add comma after specified

jsmitchell · 2019-04-01T18:48:02Z

text/0000-add-peers-in-runtime.md

+
+- If the `admin` role is not specified then the permissioning module will use
+  the `default` policy.
+- If the `default` policy is not specified then the validation of the


Add comma after specified

jsmitchell · 2019-04-01T18:48:19Z

text/0000-add-peers-in-runtime.md

+  verifier checks:
+  - If the `admin_public_key` is allowed.
+  - If the `signature` is correct.
+  - If one of the above conditions is not satisfied then the


Add comma after satisfied

vaporos · 2019-04-02T22:19:33Z

text/0000-add-peers-in-runtime.md

+- If the `default` or the `admin` policy is specified, then the permission
+  verifier checks:
+  - If the `admin_public_key` is allowed.
+  - If the `signature` is correct.


Can you expand on this? In particular, it is not clear what is being signed.

This is the signature of the peer_uri string (https://github.com/hyperledger/sawtooth-rfcs/pull/32/files#diff-60b9c6f8741b42854f2726db45c3f899R94).

The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.

I am still waiting for a reply to this message. I am not really sure whether we should use just message signatures or develop a more complicated system for future use (I believe that such system should appear as a separate and more general RFC).

vaporos · 2019-04-02T22:25:55Z

Should the authorization for this feature occur during the connection conversation (see authorization.proto)? I think it may be appropriate to add ADMIN to "RoleType" and create an Admin* message namespace in addition to Client*, etc.

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

eugene-babichenko · 2019-04-17T10:03:55Z

Should the authorization for this feature occur during the connection conversation (see authorization.proto)?

It may be difficult to implement the authorization in the component endpoint. Should we extend the component endpoint, use the network endpoint (it already has the authorization implemented) or create a separate admin endpoint?

I think it may be appropriate to add ADMIN to "RoleType" and create an Admin* message namespace in addition to Client*, etc.

Agree on that.

dcmiddle · 2019-04-17T16:37:54Z

text/0000-add-peers-in-runtime.md

+- If the `default` or the `admin` policy is specified, then the permission
+  verifier checks:
+  - If the `admin_public_key` is allowed.
+  - If the `signature` is correct.


The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.

dcmiddle · 2019-04-18T18:45:19Z

Should the authorization for this feature occur during the connection conversation (see authorization.proto)?

It may be difficult to implement the authorization in the component endpoint. Should we extend the component endpoint, use the network endpoint (it already has the authorization implemented) or create a separate admin endpoint?

I think an admin endpoint would be the most secure. This would facilitate firewall rules to restrict traffic. The downside is it adds one more element to a node setup. We used to get a lot of support questions stemming from configuring the existing endpoints. That said, I don't think this will add much complexity. A loopback default wouldn't need to be modified. A simple setup would then have the administrator logging into the validator host to issue these commands and the respective private key could reside on that validator host adding no other key management burden.

eugene-babichenko · 2019-06-05T07:46:59Z

Is that OK if I reopen this pull request from another repository?

agunde406 · 2019-06-05T15:30:20Z

Please dont, we dont want to loose this conversation. @eugene-babichenko

eugene-babichenko · 2019-06-06T17:13:49Z

@agunde406 The problem is I no longer have access to this fork but still want to finish this RFC.

agunde406 · 2019-06-06T17:36:12Z

@eugene-babichenko Ah okay. In that case, yes it is okay to resubmit the RFC from a new fork. Please do not change commit history and link this PR in the new one.

eugene-babichenko · 2019-06-07T10:52:33Z

@agunde406 Here is the new pull request #44

RFC for adding peers in runtime

9dacf81

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

eugene-babichenko force-pushed the add-peer-in-runtime branch from 17747df to 9dacf81 Compare November 13, 2018 13:41

agunde406 assigned vaporos Nov 13, 2018

agunde406 requested review from agunde406, aludvik, dcmiddle, jsmitchell, peterschwarz and vaporos November 13, 2018 13:50

Add new message types specification.

a71be6a

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

eugene-babichenko force-pushed the add-peer-in-runtime branch from 95f303d to a71be6a Compare November 13, 2018 15:12

dcmiddle suggested changes Nov 30, 2018

View reviewed changes

aludvik reviewed Nov 30, 2018

View reviewed changes

eugene-babichenko added 4 commits December 17, 2018 12:45

Add the peer removal request and response.

e43b33f

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Extend error statuses and remove the peers one by one instead of doin…

9f32632

…g that in batches. Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Add basic permissioning description.

5a87a74

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Update example code.

6ca942d

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

agunde406 requested review from aludvik and dcmiddle December 17, 2018 17:19

agunde406 suggested changes Jan 4, 2019

View reviewed changes

eugene-babichenko added 3 commits January 8, 2019 13:59

Fix variant numbering in Message.MessageType enum.

99cf57e

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Fix mistakes in the Motivation section.

baaf737

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

Add description of message processing for peers removal.

4f17b19

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

agunde406 self-requested a review January 8, 2019 19:47

agunde406 approved these changes Jan 14, 2019

View reviewed changes

peterschwarz approved these changes Jan 16, 2019

View reviewed changes

jsmitchell suggested changes Apr 1, 2019

View reviewed changes

vaporos reviewed Apr 2, 2019

View reviewed changes

Address comments from @jsmitchell

211d197

Signed-off-by: Yevhenii Babichenko <eugene@remme.io>

eugene-babichenko requested review from ineffectualproperty and TomBarnes as code owners April 17, 2019 09:50

agunde406 requested review from jsmitchell and vaporos April 17, 2019 13:08

jsmitchell approved these changes Apr 17, 2019

View reviewed changes

dcmiddle suggested changes Apr 17, 2019

View reviewed changes

eugene-babichenko mentioned this pull request Jun 6, 2019

[reopened] Add new peers without restarting a node #44

Open

Add new peers without restarting a node #32

Are you sure you want to change the base?

Add new peers without restarting a node #32

Conversation

eugene-babichenko commented Nov 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agunde406 Nov 30, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcmiddle commented Nov 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eugene-babichenko commented Dec 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vaporos commented Jan 22, 2019

eugene-babichenko commented Jan 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vaporos commented Apr 2, 2019

eugene-babichenko commented Apr 17, 2019

Choose a reason for hiding this comment

dcmiddle commented Apr 18, 2019

eugene-babichenko commented Jun 5, 2019

agunde406 commented Jun 5, 2019

eugene-babichenko commented Jun 6, 2019

agunde406 commented Jun 6, 2019 • edited

eugene-babichenko commented Jun 7, 2019

agunde406 Nov 30, 2018 •

edited

agunde406 commented Jun 6, 2019 •

edited