Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting owned Unstructureds #147

Open
fitzgen opened this issue Apr 21, 2023 · 2 comments
Open

Consider supporting owned Unstructureds #147

fitzgen opened this issue Apr 21, 2023 · 2 comments

Comments

@fitzgen
Copy link
Member

fitzgen commented Apr 21, 2023

Talking with some folks who would like to pass around an Unstructured through Cranelift to implement a "chaos mode" that does semantics-preserving random mutations like shuffle basic blocks, but don't want to thread Unstructured's lifetimes through the whole code base. Would instead like to have an owned version of Unstructured that they can attach to existing context structures.

Could do this with a wrapper struct and self borrows but would like to make sure this is UB-free and in a place where it can be reused by anyone else who has similar needs:

struct OwnedUnstructured {
    bytes: Vec<u8>,
    // self-borrow of `bytes`; needs MIRI to tell us
    // if we need `ManuallyDrop`s in here and all that
    u: Unstructured<'static>,
}

impl {
    pub fn u<'a>(&'a mut self) -> &'a mut Unstructured<'a> {
        unsafe { mem::transmute(&mut self.u) }
    }
}

// Unfortunately, `Deref[Mut]` doesn't work because we need to
// constrain the `Unstructured`'s lifetime to the `self` borrow
// but we can't do that without GATs in the `Deref` trait.
impl Deref for OwnedUnstructured {
    type Target = Unstructured<'what_lifetime_to_put_here>;

    // ...
}

Another approach could be to have a Cow<'a, [u8]> in Unstructured itself, although then you'd end up with an Unstructured<'static> but the way that Arbitrary is defined, this would let you create arbitrary &'static [u8]s which is not correct.

Not even sure if this is the right approach, might be better to do something like

struct OwnedUnstructured {
    bytes: Vec<u8>
    cursor: usize,
}

impl OwnedUnstructured {
    // Get a non-owned `Unstructured` of these bytes.
    pub fn u<'a>(&'a mut self) -> impl DerefMut<Target = Unstructured<'a>> {
        struct U<'a> {
            cursor: &'a mut usize,
            initial_len: usize,
            u: Unstructured<'a>
        }
        impl Drop for U<'_> {
            fn drop(&mut self) {
                // Advance cursor by number of bytes consumed. This is buggy
                // because it assumes bytes are only taken from the from of
                // the input, never from the back, which is not true. Don't
                // know how to work around this without having a double-borrow
                // of `self.bytes`.
                self.cursor += self.initial_len - self.u.len();
            }
        }
        impl Deref for U<'_> { /* ... */ }
        impl DerefMut for U<'_> { /* ... */ }

        let u = Unstructured::new(&self.bytes[self.cursor..]);
        U {
            cursor: &mut self.cursor,
            initial_len: u.len(),
            u
        }
    }
}

Just brainstorming at this point.

Thoughts? Ideas?

cc @cfallin

@Manishearth
Copy link
Member

The yoke crate lets you do this.

@senekor
Copy link

senekor commented Apr 22, 2023

Out of curiosity, I made a little POC in the cranelift control plane with yoke and it pretty much does exactly what we need. The diff can be found here.

The code complexity did increase in my opinion. Solving this problem within arbitrary would shift that complexity from the user to the library. I guess what the best tradeoff is depends on how common this use case is. And I think @fitzgen mentioned that it's quite niche.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants