Weaker frame ordering in Connection-Oriented communication #300

OlegDokuka · 2020-03-07T21:07:50Z

Background

In this issue, I would like to discuss the strong requirement of RSocket, which sounds like the following:

Connection-Oriented and preservation of frame ordering. Frame A sent before Frame B MUST arrive in source order. i.e. if Frame A is sent by the same source as Frame B, then Frame A will always arrive before Frame B. No assumptions about ordering across sources are assumed.

From my point of view, it is an over strong requirement to deliver all the frames in the same order they have issued.

However, this strong requirement makes sense for per logical stream basis.

Notably, it makes even more sense when it comes to HTTP/3 and QUIC as new protocols that can be used as a transport for RSocket connection.

A QUIC stream provides reliable in-order delivery of bytes but makes no guarantees about order of delivery with regard to bytes on other streams
[source]

It means that if a source issued frame A for a stream 1 and then frame B for a stream 2 in QUIC there are no guarantees that frame B will arrive after frame A. However, there is a guarantee (at least in HTTP/3 spec) that if frame A for stream 1 and then the frame B was issued afterward, then there is a strict guarantee that frame B arrives after frame A

Proposal

The following is the proposed changes to the RSocket protocol wording.

Connection-Oriented and preservation of frames' ordering within a stream. Frame A sent before Frame B on a per-stream basis MUST arrive in source order. i.e. if Frame A is sent by the same source as Frame B, then Frame A will always arrive before Frame B with-in a stream. No assumptions about ordering across streams are assumed. No assumptions about ordering across sources are assumed.

OlegDokuka · 2020-03-07T21:09:29Z

cc @benjchristensen @tmontgomery @linux-china @nebhale @stevegury @yschimke @rdegnan @rstoyanchev @smaldini

OlegDokuka · 2020-03-08T09:31:58Z

related rsocket/rsocket-java#749

benjchristensen · 2020-03-09T17:20:22Z

If I remember correctly, the primary reason for this requirement was to allow efficient implementation of connection resumption. If the frame ordering is not deterministic and in ordered at the connection level, then client and server need to track and ack on each individual stream, as opposed to acks just on the connection level. This is why the KEEPALIVE frame has the Last Received Position (https://github.com/rsocket/rsocket/blob/master/Protocol.md#frame-keepalive), which would no longer be useful if the connection wasn't ordered.

Targeting QUIC/HTTP3 as a transport may be strong enough reasons to change this, but resumption would then need to be redefined and redesigned. Resumption has never really been defined and supported well enough to be honest, so probably needs that anyways.

No matter what though, if this gets changed, it's not as simple as just changing the wording as stated above, as evidenced by the KEEPALIVE frame assuming ordering in sending back the last received position, which would be meaningless if this change happens.

OlegDokuka · 2020-03-09T18:20:05Z

Perfect! Thank you so much @benjchristensen. Your comment is super useful because I totally missed that part.

Alright. I will work out my proposal more in order to cover the resumption case as well

benjchristensen · 2020-03-09T18:24:48Z

I think you should consider a rethink of resumption for this as you pursue it. Now that HTTP/3 is evolving, I think layering RSocket on top of it naturally is a very strong reason to do so. And resumption should not prevent that.

We used resumption for about 1 year for mobile connections, but ended up pursuing an alternative approach that made more sense at the application layer, so aren't even using that feature right now.

Is anyone else using resumption at this time that you're aware of?

OlegDokuka · 2020-03-09T18:29:27Z

Not sure anybody. @linux-china @szihai are you using Resumption at Alibaba right now?

OlegDokuka · 2020-03-09T18:30:17Z

@benjchristensen Can you please share|give some ideas of what you have done at facebook in order to replace built-in resumption?

linux-china · 2020-03-09T18:36:19Z

@OlegDokuka we don't have requirement for frame ordering for connection, just frame ordering for request/stream. I know a guy uses request/stream to get configuration notification from config server, and frame ordering is important for this case.

benjchristensen · 2020-03-09T18:36:50Z

The use case we had was one where different streams had dramatically different costs of resumption, and for us, server-side memory cost (DRAM) was a big deal. So, instead of keeping all stream state on all connections, we were able to get more intelligent by doing application level behavior. It's more complex, and hardcoded to just that use case. Basically at connection establishment, the client would try and start all streams again, but would send some metadata that allowed the server to identify the stream and determine if it should recalculate everything and resend, or it it was one of the "expensive ones" that was cached, etc.
This was something that only worked because of nuances of how the use case behaved, and because it was worth it for us in this case to save DRAM and spend more engineering time, instead of just using the easy resumption model.

OlegDokuka · 2020-03-09T18:40:07Z

@linux-china The question is more on resumption since right now keepalive sends the response back and say that 6 frames where consumed. With that strong ordering requirement that we have right now in RSocket spec, it works perfectly because 6 frames on the responder side are the same 6 frames in the same order for the sender side.

But without that strong requirement, 6 frames sent are not necessarily equal to the same 6 frames received. This means that resumption, as it is right now, will not work anymore.

Anyways. The question was whether you use the resumption feature at Alibaba right now

OlegDokuka · 2020-03-09T18:43:39Z

@benjchristensen sounds like a more sophisticated use case for leasing. I would really appreciate your look at #273 (comment). May be in combination with more advanced leasing control we can get resumption working for you

linux-china · 2020-03-09T18:50:13Z

@OlegDokuka got your point. Now we don't have this requirement to validate frames consistency. For heavy requests, and it's hard for us to track frames on server side.

OlegDokuka · 2020-03-09T18:51:32Z

@linux-china but, are you using/planing to use resumability? That was my question. (that for general statistic)

linux-china · 2020-03-09T18:52:11Z

@OlegDokuka we don't use resumability feature now.

linux-china · 2020-03-09T18:58:21Z

The use case we had was one where different streams had dramatically different costs of resumption, and for us, server-side memory cost (DRAM) was a big deal. So, instead of keeping all stream state on all connections, we were able to get more intelligent by doing application level behavior.

Same style in Alibaba.

benjchristensen · 2020-03-09T21:21:27Z

@benjchristensen sounds like a more sophisticated use case for leasing.

Not really. Leasing is about choosing what server to go to, and that's not what this was about. Resumption and aggregate server-side state to allow resumption have tradeoffs, and leasing, to choose a server, doesn't change that. Especially since we were using stateful servers, so the connection had to re-establish on the same sticky server in order to benefit from resumption for the most efficient behavior (restoring on a different server is slower and more costly).

To make this more clear, I'd support a complete rethink of how RSocket enables, supports, or makes resumption possible, since in practice, we ended up not using it "out of the box" anyways. And since layering on top of QUIC/H3 needs to break resumptions anyways, it seems that's what should happen. This would require a bump in the major version of the spec however. I think support QUIC/H3 as a transport layer is a good reason for a major version.

tmontgomery · 2020-03-09T21:23:02Z

I'd support the idea of rethinking resumption over H3.

OlegDokuka · 2020-03-09T21:52:36Z

@benjchristensen sounds like a more sophisticated use case for leasing.

Not really. Leasing is about choosing what server to go to, and that's not what this was about. Resumption and aggregate server-side state to allow resumption have tradeoffs, and leasing, to choose a server, doesn't change that. Especially since we were using stateful servers, so the connection had to re-establish on the same sticky server in order to benefit from resumption for the most efficient behavior (restoring on a different server is slower and more costly).

To make this more clear, I'd support a complete rethink of how RSocket enables, supports, or makes resumption possible, since in practice, we ended up not using it "out of the box" anyways. And since layering on top of QUIC/H3 needs to break resumptions anyways, it seems that's what should happen. This would require a bump in the major version of the spec however. I think support QUIC/H3 as a transport layer is a good reason for a major version.

The idea proposed in the mentioned PR is related to how many frames a server can handle from the requester. (better to read the thread)

In a few words, you can control the number of frames you receive, so having different streams with different payloads size from different servers you can, potentially, map that on your memory usage and prevent the case where one stream/server sends much more data than the receiver can store in the resumption store without affecting the stability (or at least that is what I understood from your case and the idea mentioned basically mapped with the proposed leasing changes)

Anyways. I will rework the resumption taking into account H3 and will send a PR

OlegDokuka · 2020-03-09T21:55:11Z

And since layering on top of QUIC/H3 needs to break resumptions anyways, it seems that's what should happen. This would require a bump in the major version of the spec however. I think support QUIC/H3 as a transport layer is a good reason for a major version.

@benjchristensen
Do you think we have to go for 1.0 as is and then bump 2.x in order to break resumption?

OR we can do a 0.2 release and then finalize that as a part of 0.3

rstoyanchev · 2020-03-11T09:57:57Z

Thank you so much @benjchristensen. Your comment is super useful

+1

Especially since we were using stateful servers, so the connection had to re-establish on the same sticky server in order to benefit from resumption for the most efficient behavior (restoring on a different server is slower and more costly).

I could be wrong but in the Java implementation at least, a client can resume only by reconnecting to the same server and if the server hasn't restarted. So the counterbalance for convenience is more limited applicability.

Resuming with some index at the application layer is better suited to a wider range of cases at the cost of less convenience.

Having convenient resumption out of the box is fine I guess, if it works for you, as long as that doesn't get in the way of supporting newer transports.

OlegDokuka · 2020-03-11T13:45:26Z

So, so far what I ended up and the simplest that we can do in order to make reusability working with keepalive is to send ACKs for every logical stream.

Current

So right now KEEPALIVES works as the following

KEEPALIVE[receiver queue position] -> sender discards all the frame up to that position and waits for the next keepalive
...
KEEPALIVE[receiver queue position] -> sender discards all the frame up to that position and waits for the next keepalive

So, to clarify how it works, let's consider that we have the following ordering on the sender side

[
	{Frame 1; Streams 1}, 
	{Frame 2; Streams 2},
	{Frame 3; Streams 3},
	{Frame 4; Streams 1},
	{Frame 5; Streams 3},
	{Frame 6; Streams 1}
]

Once the KeepAlive Frame is received with the following content:

KEEPALIVE[{receiver position = 4]

and initial receiver position on the sender side is, for instance, 0

Then, the following going to happen with the sender queue:

[
	// drop							{Frame 1; Streams 1}, 
	// drop							{Frame 2; Streams 2},
	// drop							{Frame 3; Streams 3}, 
	// drop							{Frame 4; Streams 1},
	{Frame 5; Streams 3}, // untouched
	{Frame 6; Streams 1}  // untouched
]

This works because of the strong ordering and will not work with weak ordering on the connection level

Proposal

In order to fix all the issues related to weak ordering, KEEPALIVE can send the number of consumed element for every stream
For example:

KEEPALIVE[
	{Stream ID 1: Consumed 5 frames},
	{Stream ID 5: Consumed 1 frames},
	{Stream ID 2: Consumed 2 frames},
	{Stream ID 12: Consumed 2 frames},
]

In this case -> sender discards frames by traversing trough the queue and discards an exact number of stream-related frames on its path.

For example, If we have the following ordering on the sender side

[
	{Frame 1; Streams 1}, 
	{Frame 2; Streams 2},
	{Frame 3; Streams 3},
	{Frame 4; Streams 1},
	{Frame 5; Streams 3},
	{Frame 6; Streams 1}
]

and the KeepAlive Frame looks like the following

KEEPALIVE[
	{Stream ID 1: Consumed 1 frames},
	{Stream ID 3: Consumed 2 frames}
]

Then the following should happen with the sender queue

[
	// drop							{Frame 1; Streams 1}, 
	{Frame 2; Streams 2}, // keep
	// drop							{Frame 3; Streams 3}, 
	{Frame 4; Streams 1}, // keep
	// drop							{Frame 5; Streams 3}, 
	{Frame 6; Streams 1}  // untouched
]

All frames that were not discarded go back into dequeue using method addFirst in the same order they had been there initially

What do you think folks @linux-china @benjchristensen @tmontgomery @nebhale @smaldini @rdegnan @rstoyanchev @stevegury

OlegDokuka · 2020-03-11T13:54:47Z

@rstoyanchev

I could be wrong but in the Java implementation at least, a client can resume only by reconnecting to the same server and if the server hasn't restarted. So the counterbalance for convenience is more limited applicability.

In general, you can supply custom session/frames storage so it is not limited to hot storages (can be cold, or semi-cold e.g. Hazelcast)

rstoyanchev · 2020-03-16T13:14:33Z

@OlegDokuka, it seems the cleanup on KEEPALIVE happens in RSocket Java only if RSocketFactory#resumeCleanupStoreOnKeepAlive() is on, or otherwise space is freed up as needed when the cache fills up. How does that relate to the above, i.e. when cleanupStoreOnKeepAlive is off (the default)?

What about non-streams like FNF and RR. Technically those don't need resuming?

If resume tracks on stream level, it'd be a shame not to be able to indicate which streams need it and which don't.

Overall, is HTTP/3 the main reason to consider a change at this time? If there aren't others, then perhaps we can create an issue to re-think resumption in light of HTTP/3, and aim to address it comprehensively.

OlegDokuka · 2020-03-16T13:43:41Z

@rstoyanchev

Overall, is HTTP/3 the main reason to consider a change at this time? If there aren't others, then perhaps we can create an issue to re-think resumption in light of HTTP/3, and aim to address it comprehensively.

It came up that this ticket starter being a one on KeepAlive as well since strong ordering is only because of resumability.

it seems the cleanup on KEEPALIVE happens in RSocket Java only if RSocketFactory#resumeCleanupStoreOnKeepAlive() is on, or otherwise space is freed up as needed when the cache fills up. How does that relate to the above, i.e. when cleanupStoreOnKeepAlive is off (the default)?

Right now everything is resumed. Every frame sent over a connection is stored in the queue.

Also, it waits for the upcoming KeepAlive to understand how many of the stored frames can be dropped (in rsocket-java this is optional but specs clearly explains the mechanism).

What about non-streams like FNF and RR. Technically those don't need resuming?

Every stream should be ressumable. I can imagine FNF fired, enqueued, flashed and then connection disappeared, which means it is not guaranteed that the frame has been delivered. Thus, to make sure we can resume even FNF, we have to store all the frames. For RR the same. The request might be initiated but the response was not delivered.

OlegDokuka added enhancement help wanted needs decision labels Mar 7, 2020

OlegDokuka mentioned this issue Mar 25, 2020

InMemoryResumableFrameStore guess FrameSize from ByteBuf rsocket/rsocket-java#755

Closed

OlegDokuka linked a pull request Aug 23, 2020 that will close this issue

changes resumability spec to make it compatible with QUIC #312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weaker frame ordering in Connection-Oriented communication #300

Weaker frame ordering in Connection-Oriented communication #300

OlegDokuka commented Mar 7, 2020 •

edited

OlegDokuka commented Mar 7, 2020

OlegDokuka commented Mar 8, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 •

edited

linux-china commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

linux-china commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 •

edited

linux-china commented Mar 9, 2020

linux-china commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

tmontgomery commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

rstoyanchev commented Mar 11, 2020

OlegDokuka commented Mar 11, 2020 •

edited

OlegDokuka commented Mar 11, 2020 •

edited

rstoyanchev commented Mar 16, 2020 •

edited

OlegDokuka commented Mar 16, 2020

Weaker frame ordering in Connection-Oriented communication #300

Weaker frame ordering in Connection-Oriented communication #300

Comments

OlegDokuka commented Mar 7, 2020 • edited

Background

Proposal

OlegDokuka commented Mar 7, 2020

OlegDokuka commented Mar 8, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 • edited

linux-china commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 • edited

OlegDokuka commented Mar 9, 2020 • edited

linux-china commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 • edited

linux-china commented Mar 9, 2020

linux-china commented Mar 9, 2020

benjchristensen commented Mar 9, 2020

tmontgomery commented Mar 9, 2020

OlegDokuka commented Mar 9, 2020 • edited

OlegDokuka commented Mar 9, 2020 • edited

rstoyanchev commented Mar 11, 2020

OlegDokuka commented Mar 11, 2020 • edited

Current

Proposal

OlegDokuka commented Mar 11, 2020 • edited

rstoyanchev commented Mar 16, 2020 • edited

OlegDokuka commented Mar 16, 2020

OlegDokuka commented Mar 7, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 9, 2020 •

edited

OlegDokuka commented Mar 11, 2020 •

edited

OlegDokuka commented Mar 11, 2020 •

edited

rstoyanchev commented Mar 16, 2020 •

edited