Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_byte_buffer equivalent in bincode 2? #679

Closed
ruihe774 opened this issue Nov 4, 2023 · 4 comments
Closed

get_byte_buffer equivalent in bincode 2? #679

ruihe774 opened this issue Nov 4, 2023 · 4 comments

Comments

@ruihe774
Copy link

ruihe774 commented Nov 4, 2023

Hi. I am interested in bincode 2 and am trying migrating from v1. A difficult point is that I do not find an equivalent of BincodeRead::get_byte_buffer() in v2. I use it for zero-copy deserialization.

You might think, anyway, we need to first allocate a Vec<u8> then read data into it when using get_byte_buffer, hence there will be no difference compared to Reader::read() in v2, which fills data into a provided &[u8]. However, this is not the case when we are memory mapping a file and using a custom global allocator which can 1) ignore deallactions in that heap regions and latter free them in batch, or 2) can do partial deallocation. In my use case, I directly construct Vec<u8> (unsafely) from slices within the ignored heap regions and pass its ownership to deserializer in get_byte_buffer. The Vec<u8> is then passed to serve_bytes::ByteBuf or String in my data structures. There is zero copy in this deserializing process.

My use case may be tricky. However, as the allocator API is getting stabilized, we will be able to do it in a safer way.

I am wondering whether it would be better to provide a similar API in v2.

@VictorKoenders
Copy link
Contributor

We're thinking of switching to the read_buf API (tracking issue, RFC) for this, once that is stabilized. Does that cover your use-case?

@ruihe774
Copy link
Author

ruihe774 commented Nov 7, 2023

We're thinking of switching to the read_buf API (tracking issue, RFC) for this, once that is stabilized. Does that cover your use-case?

Unfortunately, no. read_buf is to avoid initialization. Maybe I did not describe it clearly. I'd like to provide some code samples.

For example, I have a struct:

#[derive(Serialize, Deserialize)]
struct MyData {
    // Serde will treat Vec<u8> as an array.
    // So, we need `serde_bytes` here.
    // Or, we can use String as an example.
    #[serde(with = "serde_bytes")]
    bytes: Vec<u8>,
    // ..other fields
}

I memory-map a file into memory:

let mmap = memmap2::MmapMut::map_mut(&file)?;
let ptr = mmap.as_mut_ptr();
let len = mmap.len();
let reader = MyReader { ptr, len };
std::mem::forget(mmap);

Then I implement a custom BincodeReader:

struct MyReader {
    ptr: *mut u8,
    len: usize,
}

impl<'a> BincodeRead<'a> for MyReader {
    fn get_byte_buffer(&mut self, length: usize) -> bincode::Result<Vec<u8>> {
        if self.len < length {
            // ...error handling stuff
        }
        // construct the Vec just on the mapped memory
        let vec = unsafe { Vec::from_raw_parts(self.ptr, length, length) };
        self.ptr = unsafe { self.ptr.add(amt) };
        self.len -= amt;
        Ok(vec)
    }

    // ...other methods
}

Finally,

let data: MyData = bincode::deserialize_from_custom(reader);

This will work if we are using a custom global allocator that ignores deallocations in mapped heap region (otherwise it will corrupt). You can see that data.bytes will right point to the mapped memory: it is completely copy-free, and the copying from kernel file buffer to userspace is also eliminated.

@VictorKoenders
Copy link
Contributor

I think I understand what you want for your use case.

Unfortunately the bincode trait can't return Vec directly any more because we want to make embedded systems a first class citizen for bincode. I don't see an easy way to add a specialized function that returns a Vec only if alloc is enabled, this sounds like it'd break horribly if bincode is somewhere in a complex dependency tree.

It would be nice if we could do something like

trait CustomAllocator {
    fn allocate_vec_in_place(...);
}

impl<T, A> bincode::Decode for Vec<T, A>
    where T: bincode::Decode,
    A: CustomAllocator
{
     // ...
}


impl<T> bincode::Decode for Vec<T, Global>
    where T: bincode::Decode
{
     // ...
}

But that

  1. sounds like it needs implementation specialization
  2. sounds very specific to your use case and not something we can implement globally.

For now I think the best solution would be to have your own Vec wrapper:

struct CustomAllocVec<T>(Vec<T, YourAllocator>);

impl<T: Decode> Decode for CustomAllocVec<T> {
    // you can do your custom logic here
}

But that would require having control over all vecs in your dependency tree

@ruihe774
Copy link
Author

ruihe774 commented Nov 8, 2023

But that

  1. sounds like it needs implementation specialization
  2. sounds very specific to your use case and not something we can implement globally.

Yes, you're right. Maybe we could do it with something that is similar to #[serde(with="...")]. It would be nice if we could achieve proxying using bincode's encode and decode, e.g.:

#[derive(Encode, Decode)]
struct MyData {
    #[bincode(decode_with="zero_copy_decode_vec")]
    bytes: Vec<u8>,
}

It is somewhat out of current topic, though. And we can still do it by implementing custom decode for the whole struct or with bincode::serde.

@ruihe774 ruihe774 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants