Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for ListV2 journals RPC #353

Open
psFried opened this issue Oct 27, 2023 · 7 comments
Open

Proposal for ListV2 journals RPC #353

psFried opened this issue Oct 27, 2023 · 7 comments

Comments

@psFried
Copy link
Contributor

psFried commented Oct 27, 2023

In gazette clusters that manage a large number of journals, the List RPC can fail due to exceeding the maximum allowable size of a gRPC response. Rather than trying to add pagination support to the existing List RPC, the proposal is to instead add a ListV2 RPC that returns a streaming repsonse.

The primary motivation for this is simply that pagination is tricky to implement in the server, and/or to use for the client, particularly when it's important for clients to get a consistent view of the journals.
A secondary motivation is that a streaming response can support long-running watches of the journals, which may be an important feature for clients that aren't able to connect to ETCD directly.

This proposal includes support for returning large sets of journals in a streaming response, but does not include support for persistent watches. The idea would be to re-visit the design for watches once we've determined a need.

gRPC protocol

The ListV2 protobuf would look something like this:

message ListRequestV2 {
  Header header = 1;
  LabelSelector selector = 2;
}

message ListResponseV2 {
  Status status = 1;
  Header header = 2;
  
  JournalSpec spec = 3;
  int64 mod_revision = 4;
  Route route = 5;
}


service Journal {
  rpc ListV2(ListRequestV2) returns (stream ListResponseV2);
  // existing RPCs omitted
}

The basic idea is that the client sends a single request, and then receives a series of responses that each contain a single journal. The server closes the response stream after sending the final journal, which is the only indication to the client that it now has a complete view of the requested journals.
In other words, there's no message field that indicates "this is the last message".

@jgraettinger
Copy link
Contributor

  • Deletion

What happens when a JournalSpec is removed? How does the client know about it?

  • What is "revision" relative to?

The journal mod revision ? That of route assignments ? It's ambiguous. There also isn't a way to represent the full granularity of what revision can mean (is it because an assignment updated it's status?).

  • Use a request Header for consistency with other request types

Instead of resume_revision. This variable 👈 can't do what it says in the name, since delta changes in between client's observed revision and the servers are no longer available. The only thing the server can do is give the client a full refresh.

If an RPC is broken, i'd think the client should start a new one with watch, and the broker would stream the complete current set of selected journals as-of its header and then stream ongoing updates.

All gazette request types include a header, which is used by brokers to identify a revision they should read-through before attempting to respond.

  • Remove observed_revision

This is already conveyed in the response Header (DRY).

  • Having an array of journals isn't required

gRPC is fine at sending lots of small messages and making it an array adds complexity to the protocol / potential for a message to be too large.

🤔 I'm not sure there's a way to reasonably do a truly streaming "watch", unless each watch update is essentially a full refresh of the complete journal listing + assignments, sent every time there's any change to the topology because of an etcd revision change. Which would probably be completely fine, because keyspace is already amortizing many small etcd key changes into fewer observed transitions of the keyspace...

@psFried
Copy link
Contributor Author

psFried commented Nov 1, 2023

I'm not sure there's a way to reasonably do a truly streaming "watch", unless each watch update is essentially a full refresh of the complete journal listing + assignments, sent every time there's any change to the topology because of an etcd revision change.

Yeah, it seems like keyspace doesn't really have a way to communicate deletions, only that something may have changed. So I guess we just re-send everything whenever there's been any change.

We can remove observed_revision, but it I want to confirm whether header can be used equivalently. We need some way to communicate to the client whenever the final response has been sent for a given revision. Basically, "you've now observed all journal as of revision X". The observed_revision was only added as part of that final response, and I'm wondering if it seems appropriate to do the same with header, or if we need to instead add something like boolean final_response to the response message.

@psFried
Copy link
Contributor Author

psFried commented Nov 1, 2023

🤔 If the server essentially can't support sending incremental watch events, then I wonder if it's even worth building support for watches, either in the short term or at all.

My understanding is that the client can set header.etcd.revision to last_observed_revision + 1, and the server would wait to respond until at least that revision was observed. If that's correct, then I don't see much benefit to having the streaming response be kept open. Clients could instead just poll the ListV2 RPC in a loop. Maybe we'd want to add support for watch later, but it might be nice to wait until we have a better sense of how we intend to use it first.

@jgraettinger
Copy link
Contributor

The observed_revision was only added as part of that final response, and I'm wondering if it seems appropriate to do the same with header, or if we need to instead add something like boolean final_response to the response message.

Oh, okay, that hadn't clicked for me. That changes my opinion on observed_revision. So as I understand:

  • The very first Response (only) would have header, as is typical for a gazette response.
  • It would stream an initial sync of list responses, followed by a response having only observed_revision, which communicates the end of a current Listing that's current through that revision.
  • If something changes and a new listing is sent, it's followed by observed_revision again.

That makes sense to me.

🤔 If the server essentially can't support sending incremental watch events, then I wonder if it's even worth building support for watches, either in the short term or at all.

It's a fair question. The remaining role of watch as I'm seeing it would be an optimization: The broker would still need to evaluate whether a watch has changed with each observed revision (as the shuffle implementation does today), but if it hasn't (perhaps evaluated using a hash?) then it can skip sending a full refresh to the client.

Without this optimization it could be a pretty big thundering herd, especially across data centers, but I'm legit unsure how motivating it is in the short term. It would be a regression as compare to where we are today, though.

A next question is what does "change" mean in this context: just the JournalSpec, or assignments as well? Probably we only care about journals, since reactors already discover changes to assignments through use of those listed journals.

@psFried
Copy link
Contributor Author

psFried commented Nov 1, 2023

The remaining role of watch as I'm seeing it would be an optimization

Yeah, and another reason why I'm increasingly hesitant to take this on right now is that I'm concerned that the optimization may not go far enough. In other words, once we're at a point where there's enough cross-data-plane reads for it to matter, will it then seem better to handle watches as a more incremental change stream as opposed to the full-refresh approach?

@psFried
Copy link
Contributor Author

psFried commented Nov 2, 2023

@jgraettinger I'm thinking that it makes sense to move forward with ListV2 without any watch support for now. The basic approach you described seems like it would be easy to add in backward compatible way once the need arises. Sound like a plan?

@jgraettinger
Copy link
Contributor

Yep, SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants