Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to get the number of bytes read immediately after deserializing from a slice? #325

Open
aalekhpatel07 opened this issue Jan 25, 2023 · 3 comments

Comments

@aalekhpatel07
Copy link

I'm trying to use rmp_serde to send and receive entire messages (enums or structs) via a BytesMut buffer that gets populated and emptied by a different part of the system (in this case a TcpStream).

I don't have any custom framing setup so I'm wondering if I could use rmp_serde to tell me the exact size of the serialized representation of the data (i.e. the count of bytes it deserialized) immediately after it has successfully parsed a portion of the stream into the specified type.

If there already exists an approach I apologize for having missed it. Please feel free to point me in the right direction.

I'm picturing an API like:

/// Deserialize a slice into a deserializable data type and return a count of the bytes deserialized if the deserialization was successful.
pub fn from_slice_with_size<'a, T>(input: &'a [u8]) -> (Result<T, Error>, Option<usize>)
where
    T: Deserialize<'a>
{
    ...
}

Here's an example usage:

use serde::{
    Serialize, 
    Deserialize
};

#[derive(Serialize, Deserialize)
pub enum Foo {
    Bar(String),
    Baz
}

pub struct Container {
    pub buffer: BytesMut
}

impl Container {
    ...
    fn read_foo(&mut self) -> Result<Option<Foo>, Box<dyn std::error::Error>> {
        if let Ok(foo) = rmp_serde::from_slice(&self.buffer) {
            // Currently, to get the byte count I have to serialize it again.
            // This has to be slower than keeping track of the bytes deserialized
            // while deserializing.
            let bytes_serialized = rmp_serde::encode::to_vec(&foo)?.len();
            self.buffer.advance(bytes_serialized);
            Ok(Some(foo))
        }
    }

    /// This doesn't work because there is no `from_slice_with_size` method but 
    /// it'd be neat if there was something that keeps track 
    /// of and outputs the size of the bytes deserialized.
    fn read_foo_with_size(&mut self) -> Result<Option<Foo>, Box<dyn std::error::Error>> {
        if let (Ok(foo), Some(size)) = rmp_serde::from_slice_with_size(&self.buffer) {
            self.buffer.advance(size);
            Ok(Some(foo))
        }
    }
    ...
    // Some other part takes a `&mut self` and populates the buffer.
    fn fill_up_buffer(&mut self) {
        stream.read_buf(&mut self.buffer).unwrap();
    }
}
@kornelski
Copy link
Collaborator

No. The serde API can only return the final result once it's 100% complete.

You could use the lower-level rmp to read individual items as they come. OR you could use something else around msgpack messages to split them into chunks (it could be as simple as sending <length><data> pieces over the stream).

@NChechulin
Copy link

Hi!
I am serializing a series of structs, and writing them individually to a binary file.
Could you please show which function in rmp::decode should I use to get the packed struct size? I tried several, and rmp::decode::read_map_lenseems to be the most suitable choice, however, it returns 1, which is clearly not the case.

And could you please elaborate on the 'using something else around msgpack messages'? Do you suggest just writing custom bytes after each struct, and then splitting the data by that divider?

Thanks in advance.

@NChechulin
Copy link

UPD: as a workaround, one can simply write the size of serialized struct before the actual data:

// encoding (pseudocode)
writer.write(size.to_bytes());
writer.write(serialized_bytes());

// decoding
while buf.len() > 0 {
  // firstly we read the size of packed data
  // here you might want to use big/little endian, not native one
  let size = usize::from_ne_bytes(buf[0..8].try_into().unwrap());
  // trim the buffer so that it starts with actual data
  buf = &buf[8..];
  // parse the serialized record
  let record = rmp_serde::from_slice::<Record>(&buf[..size]);
  // cut out the data
  buf = &buf[size..];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants