Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative serialization implementation that compactly stores bytes #1303

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

weikengchen
Copy link
Contributor

@weikengchen weikengchen commented Jan 9, 2024

This solves #1298. Below I explain some changes that require attention.

⚠️ please refer to this reply #1303 (comment) for the latest version.

Below is obsolete.

FdWriter:

  • I have to change the way FdWriter works because it does not offer a similar interface that allows me to "modify" the last written word. Therefore, it is added with a single-word buffer that would only be used during the u8 serialization. This buffer will be flushed when full words are written, or when FdWriter is dropped (in Rust the variable currently drops when it is no longer in the scope, such as the end of the main() function, this should work as JOURNAL processing is after it).
  • This change is clearly not ideal as it introduces new behaviors. Now, the file descriptor works like regular file descriptors in Rust, in that they don't need to be closed or flushed manually, but Rust would flush and close them when dropping them.

Byte Handler:

Location of handler reset:

  • bool
  • i8, i16, i32, i64, i128
  • u8, u16, u32, u64, u128
  • f32, f64
  • char
  • string
  • byte array

bool, u8 -> u8
i8, i16, i32, i64, u16, u32, u64, f32, f64, char -> u32
string, byte array, i128, u128 writes bytes directly (i128 calls u128 in serializer, but is separate in deserializer)

so, we just need to make sure that for string, byte array, i128, u128. u32, we reset the handler.

string and byte array starts their serialization by calling u32, so the handler would be reset before any byte is written, we don't need to add additional reset there.

as a result

  • we inject a reset in u128, u32 for serialization
  • we inject a reset in u128, u32, and i128 for serialization because i128 did not call u128 here.

Copy link

vercel bot commented Jan 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 11, 2024 4:21am

@weikengchen
Copy link
Contributor Author

Added a third test that checks edge cases. Some edge cases are as follows.

(Vec, u32) with 0, 1, 2, 3, 4, 5 bytes
(u32, Vec) with 0, 1, 2, 3, 4, 5 bytes

@weikengchen
Copy link
Contributor Author

Let me take a look at the failed tests and think about how to fix it

@weikengchen weikengchen marked this pull request as ready for review January 9, 2024 12:42
@weikengchen
Copy link
Contributor Author

weikengchen commented Jan 9, 2024

@flaub ready for CI again.

My local test has a failed test that does not seem to be due to this PR.

test tests::manager::integration_test_completed_proof_manager ... FAILED

failures:

---- tests::manager::integration_test_completed_proof_manager stdout ----
thread 'tests::manager::integration_test_completed_proof_manager' panicked at bonsai/ethereum-relay/src/tests/manager.rs:112:10:
deployment should succeed: MiddlewareError { e: MiddlewareError(JsonRpcClientError(JsonError(Error("missing field `baseFeePerGas`", line: 1, column: 39)))) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Contributor

@austinabell austinabell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial thoughts, not a thorough review

risc0/zkvm/src/guest/env.rs Outdated Show resolved Hide resolved
Comment on lines +100 to +112
if self.status == 1 {
if self.buffer[0] != 0 || self.buffer[1] != 0 || self.buffer[2] != 0 {
return Err(Error::DeserializeBadByte);
}
} else if self.status == 2 {
if self.buffer[1] != 0 || self.buffer[2] != 0 {
return Err(Error::DeserializeBadByte);
}
} else if self.status == 3 {
if self.buffer[2] != 0 {
return Err(Error::DeserializeBadByte);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this could just be slicing the range you're checking. Or is this specific for perf reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of for performance, though I feel that the compiler would be able to optimize in the same way. All these cases are not supposed to appear, and is used here to detect errors.

risc0/zkvm/src/serde/serializer.rs Outdated Show resolved Hide resolved
risc0/zkvm/src/guest/env.rs Outdated Show resolved Hide resolved
@austinabell
Copy link
Contributor

austinabell commented Jan 9, 2024

@flaub ready for CI again.

My local test has a failed test that does not seem to be due to this PR.

test tests::manager::integration_test_completed_proof_manager ... FAILED

failures:

---- tests::manager::integration_test_completed_proof_manager stdout ----
thread 'tests::manager::integration_test_completed_proof_manager' panicked at bonsai/ethereum-relay/src/tests/manager.rs:112:10:
deployment should succeed: MiddlewareError { e: MiddlewareError(JsonRpcClientError(JsonError(Error("missing field `baseFeePerGas`", line: 1, column: 39)))) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

that's just an issue with ethers-rs. The CI uses an old version of foundry because there was a breaking change (and not a released version of ethers that works) can ignore that. Fix would be patching ethers-rs to use recent git version, using old foundry version, or just ignoring that test.

@weikengchen
Copy link
Contributor Author

To summarize, Austin pointed out that this would have an issue if the user is using multiple FdWriter for the same resource. For example, the user can obtain a writer to JOURNAL by just calling env::journal(), and there is no design to prevent that. If multiple journals are used, when they are being dropped (when the variable goes out of scope), the order of the dropping can also be "undefined behavior".

This issue is real because each FdWriter may just have "leftover" that has not yet written to the buffer.

A solution is to have something similar to this in the guest:

#[derive(Clone)]
pub struct PosixIo<'a> {
    pub(crate) read_fds: BTreeMap<u32, Rc<RefCell<dyn BufRead + 'a>>>,
    pub(crate) write_fds: BTreeMap<u32, Rc<RefCell<dyn Write + 'a>>>,
}

So that each time a user requests a FdReader or FdWriter, it is actually getting a reference to an Rc<RefCell<>> one. Ideally there should be another layer of abstraction, so that regular users do not have to operate on Rc<RefCell<>> which would require borrow or borrow_mut.

@weikengchen
Copy link
Contributor Author

Let me experiment with that idea a bit. In general, we do not want to patch here and patch there, but want to resolve this systematically.

Adding a new syscall is also not reasonable.

@weikengchen
Copy link
Contributor Author

weikengchen commented Jan 9, 2024

@austinabell Circulate it to you for a third version.

This version no longer modifies how FdWrite works, it also reverts all the changes to WordWrite. All the changes are now implemented in and only in Serializer and Deserializer.

We want to restrict ourselves to serialization. The problem that we are encountering is that there are "buffered" bytes that have not been written to the stream.

First of all, we can split all types in Rust into three groups.

  • primitive types:
    • bool, u8, u16, u32, u64, u128, i8, i16, i32, i64, i128, f32, f64, char, str,
    • ()
    • unit struct aka "struct Nothing"
    • unit enum aka "enum{ A, B, C }" with no underlying variables
  • "newtype", a virtually pseudo type defined over another type, which is specifically defined for efficiency
    • Option, which can be considered as enum { None, Some(T) }
    • newtype struct, aka struct(T)
    • newtype variant, aka enum { struct1(T1), struct2(T2), ... }
  • wrapper:
    • seq, including vec![], long slice
    • tuple aka (T1, T2)
    • tuple struct aka struct(T1, T2)
    • tuple variant aka enum { struct1(T1, T2), struct2(T3, T4, T5) }
    • map
    • struct aka struct { a: A, b: B}
    • struct variant aka enum { struct1 {a: T1, b:T2}, struct2{a: T3, b: T4, c: T5) }

Note that if the buffered bytes have not been fully written to the stream, it means that the last primitive type being written has to be either bool or u8 (we will just treat bool as u8 in the following). There are no primitive types after it, otherwise it would be written to the stream because the byte handler would be reset.

So, there are only two possibilities of this u8.

  • the variable to be serialized is directly or indirectly inside some sort of a wrapper. By "indirectly", it means that struct { Option<struct(Option<u8>)> } will also be considered as "inside a wrapper".
  • the variable to be serialized is not inside any wrapper. It could be u8 or Option<struct(Option<u8>)>.

We handle both separately.

For the first case, we introduce a notion of "depth". When serializer enters a wrapper, the depth is increased by one, when it leaves the wrapper, the depth is decreased by one. When the depth is zero, it means that it must have left the last layer of meaningful wrapper, and there are no new bytes possible after it. It needs to be immediately written to the stream when the depth hits zero. The implementation looks like this:

    #[inline]
    fn decrease_depth<W: WordWrite>(&mut self, stream: &mut W) -> Result<()> {
        self.depth -= 1;
        if self.depth == 0 && self.status != 0 {
            stream.write_words(&[self.byte_holder])?;
            self.status = 0;
        }
        Ok(())
    }

For the second case, we notice that the last u8 is in depth 0, and therefore, this u8 is put into the buffer and immediately being written down into the stream, i.e., treating u8 as u32.

     fn handle<W: WordWrite>(&mut self, stream: &mut W, v: u8) -> Result<()> {
        if self.depth == 0 {
            stream.write_words(&[v as u32])?;
        } else {
            ......
        }
        Ok(())
    }

Let me know what you think. In addition, @austinabell can you approve the GitHub actions? So I may get CI to help finding bugs. I am fighting against the error "incompatible client version: 0.21.0-alpha.1" locally now.

@flaub
Copy link
Member

flaub commented Jan 9, 2024

I'll need to review before we run on CI, thanks.

@flaub
Copy link
Member

flaub commented Jan 9, 2024

If you want to test locally, you'll need to install the server from local source. You can do that with:

cargo install --path risc0/cargo-risczero

@weikengchen
Copy link
Contributor Author

Just finished a local testing. The local test passes.

@austinabell
Copy link
Contributor

@austinabell Circulate it to you for a third version.

So for context, I'm not a stakeholder for these changes, I was just reviewing to help improve the proposal since I had thought through similar patterns and personally couldn't rationalize proposing to change the default. Haven't checked the semantics in depth, and seems like it's a reasonable implementation, but I'm still unsure about these two mainly:

  • Implementing specific semantics into serialization would be footguns and breaking changes for existing uses
    • Also don't have high confidence this would pass if (de)serialization for all types was fuzzed, previous bugs were not caught by these types of unit tests
    • (There still is a minor race condition if you are calling internal functions of the deserialize with multiple readers/writers it seems, I haven't dug too much into this though)
  • Unclear the cycle overhead for this logic
    • Yes, it's very minimal, but it's changing the default for users to optimize bad patterns, not correct ones

@flaub flaub self-requested a review January 9, 2024 19:16
@weikengchen
Copy link
Contributor Author

The first point from Austin related to breaking of existing implementations. There are two thoughts here.

  • since the library still has not had a stable release, it is still okay to change the serialization

  • but we then need to be very careful because if a bug is discovered in the stable release and we need to revert this algorithm, it would be troublesome

  • the problem that we encounter is for a host to work with an old ELF. This is important because we can already see that this change affects the ELF. If the ELF is hardcoded in a larger system, then the programmer might have no way to update the guest ELF to the latest API

  • ideally, there would be, in the future, and maybe after the first stable release, a way for the executor or the prover to know what version the guest ELF is compiled on and approach that guest differently. This is reasonable since the guest itself already needs to contain some zkvm sdk glue code and it has to do with versions. This can enable the Bonsai cloud prover to know the version of the guest and support an older version of the VM if Bonsai finds that minimal adapter is needed

@weikengchen
Copy link
Contributor Author

Regarding cycle count, I have a separate question. At this moment, every time a call to read or write is made, a syscall to sys_read or sys_write would happen. This is where I am more concerned on the cycle count.

Would it be more performant if a user tries to wrap it with, for example, BufReader and BufWriter? I just feel that the current implementation does make syscalls very frequently.

@weikengchen weikengchen deleted the alter-serde branch January 15, 2024 08:32
@weikengchen weikengchen restored the alter-serde branch January 15, 2024 08:41
@weikengchen weikengchen reopened this Jan 15, 2024
@flaub flaub added this to the 0.21.0 milestone Jan 23, 2024
@nategraf
Copy link
Contributor

Reading through this, it seems the implementation is reasonable, and I can follow what the resulting codec looks like (bytes get packed, with padding whenever we transition form bytes to non-bytes). As a result, we should expect a 75% reduction in the number of calls to read a word from the host.

I went ahead and ran this on a couple of our examples to get samples. With the sha example with 50kB of data, I saw no change to performance as it is written, passing a String to the guest. This is not unexpected. I changed the example to pass in &[u8] and deserialize as Vec<u8>. This PR resulted in a 54% improvement, essentially all of which is accounted for by a reduction in the number of calls to FdReader::read_words.

I also went ahead and compared this to an implementation using end::stdin().read_to_end(&mut data). That implementation achieved a 90% reduction in cycles over the baseline of main. Although it is a significanty different method, avoiding serde all together.

Overall, I'm a bit reluctant to merge this approach. It feels a bit "stapled-on" to core the (de)serializer implementations and optimizes the single case of serializing u8, without e.g. addressing strings. This is clearly valuable for byte buffers, which are of key importance, but there exist other methods of serializing byte-buffers that are more efficient. What does seem to have over e.g. std::fs::Read is that it will work on any byte buffers that are arbitrarily nested in stucts without special logic.

All together, it feels like there is still significant work to be done, so the concern is that we will latter need to come back to this and rebuild the serializer a second time. Both of these transitions introduce some pain for developers, so ideally we'd do it once. On the other hand, we are pre-1.0.

One direction we still need to explore is using alternative, off-the-shelf, codecs. We need to explore e.g. changing out our codec for bincode or another serde codec. Going beyond that, something like rkyv is very interesting to completely avoid direct deserialization costs (especially in terms of copying memory) but would probably not be a good default due to impact on devex. Some context on the codec as it exists today, is that it was created for a world where reading data into the zkVM was zero-cost, since initial reads to a special non-deterministic region of memory were unconstrained, so the host would simply write data to that location without consuming any cycles.

@flaub do you have any opinions or intuitions on this given this information?

Profiles for this branch (e8d40279faf2928a6b29d37671e3a48c2f98336b), main (8650ac9) and a version of the example using std::io::Read::read_to_end are provided in the tarball below.

sha_profiles.tar.gz

@weikengchen
Copy link
Contributor Author

weikengchen commented Jan 31, 2024

Here is a question: is the 90% reduction on the number of calls to "read" related to the ability to read many words at the same time because it can forecast how many needs to be read?

If so, this calls for the following thing to be implemented in the guest

"BufReader"

So if there are available words in the file descriptor, let it read more.

But a sequential question is: how big is the overhead of read_words? Is the serde cycle mostly due to the inherent machinery of serde? Note that serde cycle may, for a lot of applications, not the bottleneck, but it helps with users writing the code

One solution to help with stability is to allow customized serializer and deserializer. Or implement this as a third-party dependency that wraps the existing implementation. This could be implemented in a way that is transparent and does not require modifications to RISC Zero itself, basically an env2::, or #[features(experimental_serde)]

@weikengchen
Copy link
Contributor Author

Related to the stable release, it is about whether we want to move forward with this new spec of the input where continuous bytes are packed.

Improving the performance could be sequential non-breaking changes

@weikengchen
Copy link
Contributor Author

In the meantime, I uploaded a crate that converts a data struct to and back from u32 slices. (This would avoid the compatibility issue).
l2r0-small-serde

@flaub flaub modified the milestones: 0.21.0, 0.22.0 Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants