Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ByteBuffers for message bodies #421

Open
pbillen opened this issue Oct 29, 2018 · 9 comments
Open

Use ByteBuffers for message bodies #421

pbillen opened this issue Oct 29, 2018 · 9 comments

Comments

@pbillen
Copy link

pbillen commented Oct 29, 2018

Hi all,

I would like to propose to use ByteBuffer for message bodies. Now, the API accepts a plain byte[], which forces intermediate copies of buffers if we are already working with ByteBuffer or similar representations of a buffer.

Internally, the client implementation can use array() and arrayOffset() to get the data to be sent over the wire if hasArray() returns true; or copy the contents with get() in the other case. In pure NIO mode, we could simply write the ByteBuffer to the java.nio.Channel without any intermediate overhead.

What do you think?

Thank you!

addendum

FWIW, although I believe native support for ByteBuffer is the preferred solution, we could also avoid the copy if the API would accept an offset in the provided array (+ optionally the length) as an alternative proposal:

channel.basicPublish(..., buffer.array(), buffer.arrayOffset());
channel.basicPublish(..., buffer.array(), buffer.arrayOffset(), <length>);
@michaelklishin michaelklishin changed the title Feature request - Support for ByteBuffer API Use ByteBuffers for message bodies Oct 29, 2018
@michaelklishin
Copy link
Member

This is something that can be done for 6.0.

@pbillen
Copy link
Author

pbillen commented Oct 29, 2018

@michaelklishin Thanks, that would be great. I believe it could be integrated in 5.x as well, as this should not break the current API.

Thanks, greatly appreciated!

@michaelklishin
Copy link
Member

I don't see how this would not affect several interfaces. So yes, likely it will. At least I don't see any reason not to convert to byte buffers as much as we can with a thin compatibility layer. @acogoluegnes WDYT?

@acogoluegnes
Copy link
Contributor

This should affect at least the Channel interface. If we only add methods, we can provide default methods that could e.g. fall back to regular byte arrays. If this is possible this would avoid a breaking change and the feature could go into 5.x. Only a first prototype could tell.

Regarding the feature itself, it's possible to fit it into the current code base, it wouldn't be easy though: it should go into blocking and non-blocking IO modes, the byte buffer body would still need to be broken into several frames if it exceeds the max frame size, the NIO mode would need to juggle between the byte buffer it owns and byte buffer(s) from message body frames. Nothing blocking here, this is just implementations details, but it's not a trivial change.

Some random thoughts (benefits/limitations of the feature):

  • the most obvious benefit is to avoid memory copy between the message body and the bytes sent on the wire. I think the number of use cases that would benefit from this is very small. I can only think of scenarios whereby the message body byte buffers come directly from other NIO channels (files, network) and will be sent to RabbitMQ as is. Other scenarios will typically involve heap byte buffers and memory copy will eventually occur. The use case @pbillen mentioned (not longer in the original comment) could make use of byte buffers, but only because the lower layers of the client would know to start using the buffer at arrayOffset. And here the only benefit would have been to avoid a copy, the byte buffer was on the heap anyway.
  • no gain for the blocking mode
  • people using ByteBuffer in their application wouldn't have to use byte arrays. I think this is a minority of applications, considering ByteBuffer isn't an easy API to work with.
  • TLS: memory copy benefits are lost (there's an intermediate TLS byte buffer to encrypt the data because they're sent encrypted on the wire)

I still need to be convinced to make the client code base more complex for a feature that could benefit only a very limited number of use cases.

@pbillen
Copy link
Author

pbillen commented Oct 29, 2018

FYI, I left out my initial specific PostgreSQL example (which can still be looked up in the edit history) as I expected this would be distracting from the generic intent in the change request here. In that example, the buffer was indeed on the heap, but I believe that is a shortcoming from the implementation. In theory, PostgreSQL WAL events could be backed by a pure NIO channel from the network. I believe the reason for returning a ByteBuffer right now is as a preparation for a pure NIO implementation someday, which should be considered as an implementation detail at any point.

I can surely see that this complicates the implementation. While the benefits are non-existing in certain cases (such as TLS), I believe it could (?) be a performance boost in other use cases where streams from one system to RabbitMQ are implemented.

That said, as suggested in my addendum, certain buffer copies in client code can be avoided by extending the API to allow the user to define the start offset (and length) in the given byte[] array. What about this wrt. the implementation complexity?

Thanks!

@acogoluegnes
Copy link
Contributor

It's actually hard to find any benchmarks comparing heap-based byte buffers to direct byte buffers. We may need to come up with our own. Under the appropriate conditions, we could observe a performance boost (no copy) and less garbage collection activity, at least in theory. Not sure those would matter much compared network latency we'll always have.

What you suggest in the addendum could be interesting in the PostgreSQL example, that would avoid the need to copy the array (but would make the byte buffer unusable until the write happens, see the new item in my comment above). This should be doable by introducing e.g. an ArrayFragment class:

public class ArrayFragment {
  byte [] array;
  int start;
  int length;
  ...
}

We could add new Channel#basicPublish methods that accept an ArrayFragment, default implementations in the interface could just Arrays#copyOfRange and delegate to the methods accepting an array, that would a reasonable default to be able to include this in the 5.x. @michaelklishin WDYT?

@pbillen
Copy link
Author

pbillen commented Oct 29, 2018

That would be great. You could make the length field in ArrayFragment optional in the constructor/builder, as in most cases you are reading the full buffer starting from start. So, it is usually array.length - offset.

Wrt. pure ByteBuffer support, I can see your point. It's all guesstimates, meaning that it's hard to decide beforehand whether the introduced complexity is worth the trouble. I guess it is your call ;).

Thanks again!

@acogoluegnes
Copy link
Contributor

@pbillen I created a follow-up issue for the fragment array part: #422.

For the ByteBuffer support, I created a benchmark project to experiment and to find out for which use cases direct byte buffers could matter.

@pbillen
Copy link
Author

pbillen commented Nov 3, 2018

@acogoluegnes Thanks! I'm very interested in the outcome of that benchmark project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants