Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RaftNetwork::send_append_entries() compatible with rkyv app #479

Open
drmingdrmer opened this issue Jul 28, 2022 · 10 comments
Open

Make RaftNetwork::send_append_entries() compatible with rkyv app #479

drmingdrmer opened this issue Jul 28, 2022 · 10 comments

Comments

@drmingdrmer
Copy link
Member

drmingdrmer commented Jul 28, 2022

pub async fn append_entries(
&self,
rpc: AppendEntriesRequest<C>,
) -> Result<AppendEntriesResponse<C::NodeId>, AppendEntriesError<C::NodeId>> {

Here, AppendEntriesRequest<C> would need to be the archived version ArchivedAppendEntriesRequest<C>. And to make matters a little more complicated, the Archive would also have a lifetime or be the bytes with some way to say to the API that these bytes should be ArchivedAppendEntriesRequest<C>.
It's pretty straight forward to just add the macros to allow for creating the rkyv Archives to the data types in openraft, but it won't actually give the benefit unless the raft api allows for them to be used. Because right now you would still have to deserialize the Archive sent across the wire to use the API.

A pretty straight forward work around without any API changes is to just use the Archive (aka Vec) in the AppData/AppResponseData so that way the deserialization will be much faster than if it were a complex Struct/Enum.

Some other people have also suggested improving the storage-to-network interface.
The general idea is to let storage send serialized data directly to the network API.

The first modified append-entries API that comes into my mind is to pass in a reader so that a network implementation could drain data directly from storage layer:

struct AppendEntriesMeta<NID: NodeId> {
    vote: Vote<NID>,
    prev_log_id: Option<NID>,
    leader_commit: Option<LogId<NID>>,
}

trait RaftNetwork {
    async fn send_serialized_append_entries(
        &mut self,
        rpc_meta: AppendEntriesMeta<u64>,
        mut serialized_entries_reader: impl AsyncRead,
    ) -> Result<(), ()> {
        // application impl:
        let client = MyClient();
        client.send(rpc_meta).await?;

        loop {
            match serialized_entries_reader.read(buf).await {
                Ok(x) => x,
                Err(e) => return Ok(()),
            }

            client.send(buf).await?;
        }
    }
}

I am not quite whether a byte stream is compatible with a rkyv based application.
Or does the reader have to be a stream of a rkyv Archived Entry?

I can still finish out the PR if this is still wanted, just not sure if it makes sense to now.

Make sense.
Looks like it's a common requirement to send an entries stream(and snapshot) directly to the network layer for an application that cares about performance.

@schreter suggestions or opinions? :D

Originally posted by @zach-schoenberger in #474 (comment)

@github-actions
Copy link

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@drmingdrmer drmingdrmer changed the title @drmingdrmer So after digging through setting up tests, I've come to the conclusion that adding this support doesn't really make sense IMO. At a high level it sounds like a good idea. But when it comes to the implementation it breaks down. rkyv is a great library, and I've used it in other projects. But it introduces a model that doesn't work well with the current raft API in this project. rkyv introduces the concept of an Archive data type, which is the serialized version of the data type that the annotations are applied to. The archive is typically what is used to access fields so that way deserialization is not needed (hence the zero copy). In order to get the advantage of the Archive concept, openraft would have to accept the archive as apart of the api layer, like here: https://github.com/datafuselabs/openraft/blob/1202d1b059f7e07f0f3353b74f0b2795f063ab56/openraft/src/raft.rs#L218-L221 Make RaftNetwork::send_append_entries() compatible with rkyv app Jul 28, 2022
@zach-schoenberger
Copy link
Contributor

zach-schoenberger commented Jul 28, 2022

I am not quite whether a byte stream is compatible with a rkyv based application.
Or does the reader have to be a stream of a rkyv Archived Entry?

I believe the common case would be a stream of rkyv Archived Entry. Which are really just a custom Vec<u8> (AlignedVec). You can't really use a sub part of the the archive well. Power users might combine these buffers with size prefixes and be able to stream them, but it isn't currently supported directly by the library.

@drmingdrmer
Copy link
Member Author

I believe the common case would be a stream of rkyv Archived Entry. Which are really just a custom Vec<u8> (AlignedVec).

Then what do you expect the send_append_entries() API to look like?

@zach-schoenberger
Copy link
Contributor

Just want to preface that I'm still relatively new to this code base, so I could be entirely wrong.

But as I look through the code, I keep coming back to that the current API looks just fine for the common case. It's just more on the user to leverage the serialized byte format at the top level when defining the RaftTypeConfig. At least in terms of using openraft with rkyv.

In terms of updating append_entries in general, your solution above looks great for that. Since the Network would not need to re-serialize the data and the user could do something smart when returning the range of entries.

I'm familiar with some of the other Issue's mentioned in relation to finding a better way to hand Snapshot trait requirements. Since I, like many users, am using RocksDb as the main store, it gets interesting trying to send its data over as a snapshot. I personally planned on using one approach mentioned in a comment where the Snapshot is just a small set of meta data and the actual data transfer happens in install_snapshot.

@drmingdrmer
Copy link
Member Author

I'm familiar with some of the other Issue's mentioned in relation to finding a better way to hand Snapshot trait requirements. Since I, like many users, am using RocksDb as the main store, it gets interesting trying to send its data over as a snapshot. I personally planned on using one approach mentioned in a comment where the Snapshot is just a small set of meta data and the actual data transfer happens in install_snapshot.

There is another issue discussing about updating snapshot-API:

You're right, all these 3 API are in a metadata followed by a stream of data manner(except send_vote() does not have data)

It looks like now there are 2 preferences on the network API:

  • easy to use: deal with directly structured data types,
  • high performance: deal with serialized data if possible.

These 2 are somehow conflicting with each other.
Maybe there should be a feature to control which network API to use. 🤔

@zach-schoenberger
Copy link
Contributor

The storage api would also have to allow for this when getting the entries. It doesn't look to crazy to do looking at the code. Using a top level trait for RaftNetwork and RaftStorage with each having 2 sub traits. One for the serialized version, one that is the current version. And then different impl blocks where the api is used.

@drmingdrmer
Copy link
Member Author

The storage api would also have to allow for this when getting the entries. It doesn't look to crazy to do looking at the code. Using a top level trait for RaftNetwork and RaftStorage with each having 2 sub traits. One for the serialized version, one that is the current version. And then different impl blocks where the api is used.

Can you exhibit a piece of code of this trait definition?

@zach-schoenberger
Copy link
Contributor

After going down that rabbit hole for a bit, it would end up with a lot of duplicate code and I don't see a very straight forward way to approach it. I also checked on what it would be like to add another RaftTypeConfig type for a list of entries, and it looks more doable, but still very intrusive.
https://github.com/zach-schoenberger/openraft/tree/serde_network is where I was trying out the trait stuff.
https://github.com/zach-schoenberger/openraft/tree/append_entries is where I was checking out the RaftTypeConfig type

@drmingdrmer
Copy link
Member Author

https://github.com/zach-schoenberger/openraft/tree/serde_network is where I was trying out the trait stuff.
https://github.com/zach-schoenberger/openraft/tree/append_entries is where I was checking out the RaftTypeConfig type

@zach-schoenberger
It looks like the current framework needs some refactoring to fit such a need.
I'll have some experiments on this.

@zach-schoenberger
Copy link
Contributor

Sounds good. I'm down to help out with changes. Just not sure what the approach is that you want to take for the update. Especially with how intrusive it is with the inner workings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants