Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new] ADR-40 Request Many #228

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

[new] ADR-40 Request Many #228

wants to merge 6 commits into from

Conversation

scottf
Copy link
Collaborator

@scottf scottf commented Jun 27, 2023

Request Many ADR based on @aricart's implementation and discussion in #215

adr/ADR-40.md Outdated

### Combinations
* Sentinel can be used with any other option.
* Both time options can be used together.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last statement is vague.


Making the request and handling multiple replies is straightforward. Much like a simplified fetch, the developer
will provide some basic strategy information that tell how many messages to wait for and how long to wait for those messages.

Copy link
Member

@aricart aricart Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some additional survey of my code, and realized, that I don't issue a timeout ever, I simply return no messages (the iterator will stop and yield nothing) if the request is not answered in time. Note that in JavaScript this returns an iterator always, the iterator may not yield any messages, but if a message is received, it yields it immediately

With the above said, possibilities are:

  • stop on error or any non-100 status (this is the only source of errors that are client initiated)
    AND:
  • wait for timer (maxWait)
  • wait for n messages or timer (maxWait) which ever occurs first completes the operation
  • wait for unknown messages, done when reset timer expires (with possible alt wait) - this option has two timers, maxWait, and "jitter". On receiving the first message, the jitter timer is started, subsequent messages reset the jitter timer. If the jitter timer triggers the request is done.
  • wait for unknown messages, done when an empty payload is received or maxWait expires

The client doesn't assume success or failure - only that it might receive messages - The various options are there to short circuit the length of the wait.

The jitter value allows for waiting for the service with the slowest latency. (scatter gather)
The message count allows for waiting for some count of messages or a timer (scatter gather)
The sentinel strategy allows for waiting for chunked data, order is not required since the client will see all messages and can use its internal protocol to determine the order of the messages or if additional request for missing/corrupt data should be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has similar semantics to fetch from JS API.

Waiting for all messages or timeout, but just closing iterator when timeout is hit is a valid use case, however I think it should be client-specific, following language patterns, so may vary.

In general - all additional options - how many messages, how long, should just add additional triggers to "close", "fuse" the iterator, or do a proper thing in other patterns.

@aricart
Copy link
Member

aricart commented Jun 27, 2023

For reference:

https://github.com/nats-io/nats.deno/blob/4e53de6b869c932412e4856a60898dc9f86814b5/nats-base-client/nats.ts#L164

While we are visiting this, it may be useful to have an option/utility to split a large message into multiple messages that fit into the max payload of the client, and implements the sentinel message protocol. This seems to be an often use-case. Here's an example of a service splitting a large payload into manageable chunks - this was in response to someone trying to send 50mb payloads

https://github.com/nats-io/nats.deno/blob/4e53de6b869c932412e4856a60898dc9f86814b5/examples/services/03_bigdata.ts

And the client side:
https://github.com/nats-io/nats.deno/blob/4e53de6b869c932412e4856a60898dc9f86814b5/examples/services/03_bigdata-client.ts

adr/ADR-40.md Outdated
Making the request and handling multiple replies is straightforward. Much like a simplified fetch, the developer
will provide some basic strategy information that tell how many messages to wait for and how long to wait for those messages.

## Options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have some common defaults across clients around max time or gap time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can but reallly that value should be based on the worse rtt for the furthest service

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout shouldn't necessarily be dictated by RTT (although tbh it is in most cases). There are cases where there's value in determining that SLAs cannot be met and a hard timeout would indicate that - particularly around time dependent data where business value decreases over time or may even be harmful/misleading if stale.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the worst rtt only accounts for network time, not how long it takes one or all of the targets to process and respond to the request.

@Jarema
Copy link
Member

Jarema commented Sep 13, 2023

@aricart @piotrpio @scottf I think it is good as it is, however there are some unresolved conversations.

Please state your opinions so we can move this forward.

adr/ADR-40.md Outdated

* Responses are accepted until the max time is reached.

#### First Max / Subsequent Max
Copy link
Member

@aricart aricart Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - There's no first max as this is the max wait. In conditions where the number of responses is unknown, the subsequent max or gap timer becomes a shortcut to reduce the wait the client would have to honor to collect the response. The idea is that when the request is sent, all responders will start sending replies (after whatever the processing time is) so given some time here and there, all responses should be queued up fairly quickly so waiting for the full max_wait is simply increasing latency to complete the operation. By simply waiting for a minimal amount of time for additional time, the latency to collecting the response is reduced.

Copy link
Member

@aricart aricart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

adr/ADR-40.md Outdated
* Responses are accepted until the count is reached.
* Must be used in combination with a time as the count may never be reached.

#### Sentinel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on calls should we be clear that this is not something to add to the clients but rather a pattern to document for now?

like, this isnt hard and its more flexible:

MultiSub(subj, func(m *nats.Msg)) {
  // sentinal that can be based on nil body, headers or anything
  if len(m.Data)==0 || m.Header.Get("EOF") != "" {
    m.Sub.Unsubscribe()
    return
  }

  // handle msg
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, right - if you are yielding an iterator, you could break, and that would close it. So for example if you are using the mux subscription to handle the responses, you cannot unsubscribe there, and perhaps you want to cleanup there. Not sure that not implementing it is not helpful.

Imho, if it is a pattern, the question is whether the sentinel has any meaning, if it doesn't, the client shouldn't even see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern is the sentinel can be anything - server supports nil in one specific case but users might other things.

So to have it built in the se final detector should be an injectable dependency so we provide a nil detecting one and users can do their own detector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, maybe lets remove it then. We can always extend it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to keep it in mine, because in the mux I wouldn't have a way of cleaning. Other clients can expose the other patterns.

@Jarema
Copy link
Member

Jarema commented Nov 2, 2023

Considering new, more holistic approach to specs, should we make this a general spec about requst/reply pattern?

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants