Implementation of a gRPC transport #538

mkeeler · 2023-01-17T14:47:45Z

Overview

There are currently two commits in this PR.

Exports the DeferError future and its Init/Respond methods
Implementation of the gRPC transport

1 was necessary to allow for better code organization with transports so that not all code has to live at the top level. If we like this approach I think we should also move the other transports into a similar directory structure.

When implementing the transport I chose to take the approach of having something outside of Raft maintain the gRPC server and providing a method on the transport to register the gRPC service with that other server. No scenario is coming to mind where you would want gRPC Raft but not gRPC for other purposes so being able to share the same grpc.Server seems ideal.

Open Questions

Should the grpc transport and protobuf files live in their own module in order to not enforce arbitrary library versions on Raft consumers utilizing the other transports.

Should we have some native concept of authorization? Consul requires mTLS with specific server name verification in its Raft utilization. Should the Raft library itself be able to extract the gRPC transport credentials and have some callback for authorizing RPCs? Potentially it could even have access to the gRPC metadata to extract other forms of authorization other than mTLS.

TODOS

Metrics
Connection Pooling
Interceptors for outgoing RPCs
Tests

This is needed to allow for other transports to live in other packages

dhiaayachi

Great work @mkeeler !!
I have some cosmetics comments but nothing substantial.

Thinking about the questions you asked in the description:

I think GRPC transport should be the de facto transport for our library and so live in the library, to encourage users to go with that transport if they don't have any limitations.
About authorization, I think we should export the Auth method as you suggested and for the time being go with the mTLS in Consul, we can explore adding other methods in the future.
Thinking about retro compatibility I'm still convinced that having a dual transport that would fallback to RPC is the best option as it will not require any out of band communication and would not require a change to the protocol. We can also make the dual transport more intelligent and not perform a fallback on every request. About the Address provider, I think it's not an issue as we can construct 2 address providers in Consul side, and pass each to each transport (RPC and gRPC) and then wrap both transports into a dual transport to handle fallbacks.

I will try to start a PR on top of this one to add some tests around the new transport. Let me know what you think about the dual transport, I can maybe add it too.

dhiaayachi · 2023-01-24T14:29:00Z

proto/transport/v1/types.proto

+// mog annotation:
+//
+// target=github.com/hashicorp/raft.RPCHeader
+// output=common.gen.go


Maybe types.gen.go to keep the names in sync

dhiaayachi · 2023-01-24T14:29:18Z

proto/transport/v1/types.proto

+// mog annotation:
+//
+// target=github.com/hashicorp/raft.Log
+// output=common.gen.go


Same here about the filename

dhiaayachi · 2023-01-24T14:39:18Z

transports/grpc/server.go

+		select {
+		case g.rpcCh <- rpc:
+		case <-ctx.Done():
+			return nil, status.FromContextError(ctx.Err()).Err()


I don't think calling this is necessary in here as the handler error will be transformed into a status code when returned anyway.
I think we can just return ctx.Err() and it will be equivalent.

banks · 2023-03-09T13:30:33Z

transports/grpc/pipeline.go

+const (
+	// rpcMaxPipeline controls the maximum number of outstanding
+	// AppendEntries RPC calls.
+	rpcMaxPipeline = 128


See #541, we probably need to at least mirror that PR in this transport if we use it to avoid the same pathology.

banks · 2023-03-09T13:43:14Z

transports/grpc/server.go

+	g.logger.Info("handling an append entries pipeline")
+	// This is not as efficient as it could be. To improve this we could
+	// queue up the append entries requests in one go routine and wait
+	// on their completion and send results back in a second go routine.
+	// If sending responses is experiencing extra latency then we would
+	// be better pipelining the requests. However the existing net_transport
+	// processes commands for a singular connection serially waiting on one
+	// RPC to finish before processing more commands. Therefore this should
+	// be at least as performant as that transport.


The proposed optimization here is an alternative fix for #541 issue.

As implemented in the PR so far, this transport will have the same behavior as network transport that makes pipelining almost always worse than not pipelining at all.

If we did what this comment said though and applied batching on the recieve side here before writing to disk in one go, then that would fix the issue and we'd be able to avoid the problems of pipelining, at least in theory - we'd want to do actual performance comparisons to determine if there are other downsides.

I think ultimately it will end up being mostly equivalent though:

at low throughput pipelining doesn't impact perf at all since requests are too far apart that they complete before the next one arrives

at high throughput with sync receive processing, long pipelines prevent leader batching from grouping work which forces the pipeline to fill and the queue built up in the pipeline functions as a multiplier for latency.

at high throughput with receive side batching as proposed in this comment then you get back to the same latency as with pipelining disabled since the follower will automatically commit to disk more efficiently once a queue builds up and so keep the queue as small as possible (until we hit the max of 64 logs per batch or whatever limit we choose to prevent unbounded growth) it's not clear though if this actually is better than just not pipelining at all in the first place which acheives the same thing just by batching on the leader. In theory it might due to overlapping encoding and decoding and network RTT vs. purely sync replication but I suspect if we test it we'd only get measurable gains where the RTT is a lot larger than the disk commit times which is not a typical situation for HashiCorp raft usages at least.

This is more of a brain dump for when this is picked up than a recommendation at this point but we should do at least one of:

make pipeline size configurable and default to short like in NetworkTransport make pipelining configurable and default to max 2 in flight. #541

or batch on the receive side here too as in the comment.

I think after all your investigation we have generally proved that pipelining in the transport on the sending side is almost always worse in cases where this library is intended to be used.

More or less we want to cause the followers to commit to disk fewer times with larger batches when under heavier load. It doesn't seem worth it to batch things prior to hitting the transport on the leader and then again on the follower.
So given that we already have batching code outside of the transport on the leader, it seems like we ought to keep this transport simpler and not do receive side batching.

Tochemey · 2023-05-26T22:26:32Z

any update on this?

mkeeler added 2 commits January 12, 2023 11:54

Export the DeferError future and its Init/Respond methods

5a628e7

This is needed to allow for other transports to live in other packages

Implement a gRPC transport

4b9a6aa

dhiaayachi reviewed Jan 24, 2023

View reviewed changes

banks reviewed Mar 9, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of a gRPC transport #538

Implementation of a gRPC transport #538

mkeeler commented Jan 17, 2023 •

edited

dhiaayachi left a comment

dhiaayachi Jan 24, 2023

dhiaayachi Jan 24, 2023

dhiaayachi Jan 24, 2023

banks Mar 9, 2023

banks Mar 9, 2023

mkeeler Jun 1, 2023

Tochemey commented May 26, 2023

Implementation of a gRPC transport #538

Are you sure you want to change the base?

Implementation of a gRPC transport #538

Conversation

mkeeler commented Jan 17, 2023 • edited

Overview

Open Questions

TODOS

dhiaayachi left a comment

Choose a reason for hiding this comment

dhiaayachi Jan 24, 2023

Choose a reason for hiding this comment

dhiaayachi Jan 24, 2023

Choose a reason for hiding this comment

dhiaayachi Jan 24, 2023

Choose a reason for hiding this comment

banks Mar 9, 2023

Choose a reason for hiding this comment

banks Mar 9, 2023

Choose a reason for hiding this comment

mkeeler Jun 1, 2023

Choose a reason for hiding this comment

Tochemey commented May 26, 2023

mkeeler commented Jan 17, 2023 •

edited