Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove or replace the concept of references in UPayload #128

Closed
gregmedd opened this issue May 1, 2024 · 4 comments
Closed

Remove or replace the concept of references in UPayload #128

gregmedd opened this issue May 1, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested

Comments

@gregmedd
Copy link
Contributor

gregmedd commented May 1, 2024

The problem

In upayload.proto, the UPayload object supports two forms of data: value and reference. The value form copies the payload data into the UPayload object, while the reference form stores a raw pointer as a 64-bit integer.

There are lots of problems with the concept of passing a raw pointer in a payload. For one, it is very hard to guarantee any data safety with a raw pointer. And it would significantly complicate transport implementations - when would they need to convert references to copies and when would they not?

For data safety, we would need to be able to make these guarantees:

  • The pointer MUST remain valid for the entire life of all copies of the UPayload.
  • The pointer MUST refer to the same memory location for all recipients of the UPayload.
  • The data referenced MUST NOT be altered while being accessed through a UPayload.
  • There MUST be a signal to the data provider when all copies of UPayload are done with the pointer so the memory can be altered or released.

I see no clear way for any uProtocol implementation to provide those data safety guarantees without extending the UPayload object, and even then that is unlikely to cover all intended use cases.

Additionally, many transport implementations may have their own shared memory systems built-in that are not compatible with this separate pointer concept. For example, Zenoh can automatically use shared memory for local exchanges and multicast when publishing across the network.

For other scenarios, such as data in DMA (e.g. via dma-buf), significantly more than just a pointer would be needed in the message. There would need to be a flag that the data is host-local (or that it needs to be copied when exiting the host), and parameters for mapping the correct memory blocks would have to be added.

I've included three proposals for next steps below. Admittedly, I haven't spent a lot of time thinking on the problem, so I would appreciate alternate ideas on how to proceed.


Proposal 1: Handle references at a higher layer

Remove the reference concept and add message scoping to allow layers above uProtocol to define their own memory sharing schemes. This would involve these steps:

  • Remove the reference field from UPayload
  • Add a scope or locality field to UAttributes that allows for a message to be restricted to scopes like process or host
  • Add an optional scope/locality promotion tag to UAttributes that can be used to look up a function that can convert a message from a more restrictive scope to a less restrictive one (e.g. convert a process-scoped message to a non-scoped message by copying new data into the payload).

This would allow for arbitrary frameworks to be applied on top of uProtocol for managing shared memory objects. For example, a DMA pipeline framework could populate the payload with a data structure representing a particular dma-buf configuration for passing data between processing nodes (e.g. in a ML perception stack) while restricting the scope of those messages to uEs running on the same host. Or a device capturing camera data could define a process-local shared_ptr scheme for raw images while publishing a separate global topic containing compressed image feeds.

This wouldn't take too long to design and build, but does push work onto uP users and would require transport implementations to provide additional hooks (e.g. a "this message would be sent outside the device" callback).


Proposal 2: A full zero-copy API in uProtocol

Formalize a shared memory system into the uProtocol API. This would need to define:

  • A way for users of uP to request shared memory blocks from their uP client
  • A mechanism for locking shared memory blocks once they are populated
  • A reference counting mechanism for shared memory blocks
  • A way to clean up shared blocks when all references are released
  • A definition of policies controlling when and how shared data can be copied (e.g. for transmission outside of a host)
  • An interface for providing custom memory pools or transferring ownership of a memory resource to the uP client (e.g. to support replacing shm with dma-buf).

There are likely other things I am not thinking of 😅

Whatever API is defined would need to support a reasonable number of zero-copy and shared memory solutions, and would need to be built in such a way that it can be used on non-linux systems as well. This could impose new requirements on transport implementations (i.e. the Zenoh clients might want a way to pre-allocate shared memory buffers then pass them back to zenoh).

It doesn't seem impossible, but this would probably take some time to design.


Proposal 3: Do it later

Decide that this can not be solved right now and simply remove the reference field. Plan on a future update to uProtocol adding some shared memory / zero copy interface in a way that is a non-breaking extension to the API.

We can focus on stabilizing the existing protocol, API, and implementations for the most immediate needs right now, then spend some serious design time analyzing the problems, needs, and requirements around adding a reference system to uProtocol once that is out of the way.

NB: This also does not change the state of things very much - a user wishing to send a bare, unsafe pointer as a message could still put one in the regular payload buffer. By removing the reference field, we have the benefit of not implying that the uP client is doing something to keep those pointers safe.

@stevenhartley
Copy link
Contributor

I'm in favor of option 3 for now till we work out the shared memory details.

@Mallets
Copy link

Mallets commented May 2, 2024

I agree that passing a reference around is problematic and should be avoided at all cost.
Among all proposals, I believe Proposal 3 is the safest option since SHM has serious implications on API, system, and protocol design/implementation.

IMHO it would be wise to not add nor imply anything on SHM until a working PoC is delivered and the necessary considerations are done in uProtocol. As mentioned in the issue:

NB: This also does not change the state of things very much - a user wishing to send a bare, unsafe pointer as a message could still put one in the regular payload buffer. By removing the reference field, we have the benefit of not implying that the uP client is doing something to keep those pointers safe.

This leaves enough time to get deeper in SHM issues and design.

@stevenhartley
Copy link
Contributor

stevenhartley commented May 2, 2024

@Mallets & @gregmedd Would this be the revised UPayload then:

message UPayload {
    // Data passed by value
    bytes data = 1;

    // Serialization format of the data
    UPayloadFormat format = 4;
}

The reason why I did not make data optional is that if there is no data then UPayload is not passed

@stevenhartley stevenhartley added documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested v1.5.8 labels May 2, 2024
stevenhartley added a commit that referenced this issue May 2, 2024
The following change removes the notion of passing data by reference in UPayload till we finalize the shared memory APIs.

#128
stevenhartley added a commit that referenced this issue May 4, 2024
The following change removes the notion of passing data by reference in UPayload till we finalize the shared memory APIs.

#128
@stevenhartley
Copy link
Contributor

Completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants