Alternative serialization implementation that compactly stores bytes #1303

weikengchen · 2024-01-09T07:29:20Z

This solves #1298. Below I explain some changes that require attention.

⚠️ please refer to this reply #1303 (comment) for the latest version.

Below is obsolete.

FdWriter:

I have to change the way FdWriter works because it does not offer a similar interface that allows me to "modify" the last written word. Therefore, it is added with a single-word buffer that would only be used during the u8 serialization. This buffer will be flushed when full words are written, or when FdWriter is dropped (in Rust the variable currently drops when it is no longer in the scope, such as the end of the main() function, this should work as JOURNAL processing is after it).
This change is clearly not ideal as it introduces new behaviors. Now, the file descriptor works like regular file descriptors in Rust, in that they don't need to be closed or flushed manually, but Rust would flush and close them when dropping them.

Byte Handler:

The implementation has been further simplified compared with https://www.github.com/l2iterative/alternative-serde0. It no longer uses Rc and RefCell. It is much less hacky.

Location of handler reset:

Note that the handler is reset when the byte sequence ends. According to the analysis in [Feature] Alternative serialization for Vec<u8> and the rest (updated 1 time) #1298, there are only the following basic types that RISC Zero handles (and serializes):

bool
i8, i16, i32, i64, i128
u8, u16, u32, u64, u128
f32, f64
char
string
byte array

bool, u8 -> u8
i8, i16, i32, i64, u16, u32, u64, f32, f64, char -> u32
string, byte array, i128, u128 writes bytes directly (i128 calls u128 in serializer, but is separate in deserializer)

so, we just need to make sure that for string, byte array, i128, u128. u32, we reset the handler.

string and byte array starts their serialization by calling u32, so the handler would be reset before any byte is written, we don't need to add additional reset there.

as a result

we inject a reset in u128, u32 for serialization
we inject a reset in u128, u32, and i128 for serialization because i128 did not call u128 here.

vercel · 2024-01-09T07:29:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
website	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 11, 2024 4:21am

weikengchen · 2024-01-09T07:38:36Z

Added a third test that checks edge cases. Some edge cases are as follows.

(Vec, u32) with 0, 1, 2, 3, 4, 5 bytes
(u32, Vec) with 0, 1, 2, 3, 4, 5 bytes

weikengchen · 2024-01-09T09:10:41Z

Let me take a look at the failed tests and think about how to fix it

weikengchen · 2024-01-09T12:42:56Z

@flaub ready for CI again.

My local test has a failed test that does not seem to be due to this PR.

test tests::manager::integration_test_completed_proof_manager ... FAILED

failures:

---- tests::manager::integration_test_completed_proof_manager stdout ----
thread 'tests::manager::integration_test_completed_proof_manager' panicked at bonsai/ethereum-relay/src/tests/manager.rs:112:10:
deployment should succeed: MiddlewareError { e: MiddlewareError(JsonRpcClientError(JsonError(Error("missing field `baseFeePerGas`", line: 1, column: 39)))) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

austinabell

Just some initial thoughts, not a thorough review

risc0/zkvm/src/guest/env.rs

austinabell · 2024-01-09T15:44:53Z

risc0/zkvm/src/serde/deserializer.rs

+        if self.status == 1 {
+            if self.buffer[0] != 0 || self.buffer[1] != 0 || self.buffer[2] != 0 {
+                return Err(Error::DeserializeBadByte);
+            }
+        } else if self.status == 2 {
+            if self.buffer[1] != 0 || self.buffer[2] != 0 {
+                return Err(Error::DeserializeBadByte);
+            }
+        } else if self.status == 3 {
+            if self.buffer[2] != 0 {
+                return Err(Error::DeserializeBadByte);
+            }
+        }


Feels like this could just be slicing the range you're checking. Or is this specific for perf reasons?

sort of for performance, though I feel that the compiler would be able to optimize in the same way. All these cases are not supposed to appear, and is used here to detect errors.

risc0/zkvm/src/serde/serializer.rs

risc0/zkvm/src/guest/env.rs

austinabell · 2024-01-09T16:02:00Z

@flaub ready for CI again.

My local test has a failed test that does not seem to be due to this PR.

test tests::manager::integration_test_completed_proof_manager ... FAILED

failures:

---- tests::manager::integration_test_completed_proof_manager stdout ----
thread 'tests::manager::integration_test_completed_proof_manager' panicked at bonsai/ethereum-relay/src/tests/manager.rs:112:10:
deployment should succeed: MiddlewareError { e: MiddlewareError(JsonRpcClientError(JsonError(Error("missing field `baseFeePerGas`", line: 1, column: 39)))) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

that's just an issue with ethers-rs. The CI uses an old version of foundry because there was a breaking change (and not a released version of ethers that works) can ignore that. Fix would be patching ethers-rs to use recent git version, using old foundry version, or just ignoring that test.

weikengchen · 2024-01-09T16:30:53Z

To summarize, Austin pointed out that this would have an issue if the user is using multiple FdWriter for the same resource. For example, the user can obtain a writer to JOURNAL by just calling env::journal(), and there is no design to prevent that. If multiple journals are used, when they are being dropped (when the variable goes out of scope), the order of the dropping can also be "undefined behavior".

This issue is real because each FdWriter may just have "leftover" that has not yet written to the buffer.

A solution is to have something similar to this in the guest:

#[derive(Clone)]
pub struct PosixIo<'a> {
    pub(crate) read_fds: BTreeMap<u32, Rc<RefCell<dyn BufRead + 'a>>>,
    pub(crate) write_fds: BTreeMap<u32, Rc<RefCell<dyn Write + 'a>>>,
}

So that each time a user requests a FdReader or FdWriter, it is actually getting a reference to an Rc<RefCell<>> one. Ideally there should be another layer of abstraction, so that regular users do not have to operate on Rc<RefCell<>> which would require borrow or borrow_mut.

weikengchen · 2024-01-09T16:32:43Z

Let me experiment with that idea a bit. In general, we do not want to patch here and patch there, but want to resolve this systematically.

Adding a new syscall is also not reasonable.

weikengchen · 2024-01-09T17:46:41Z

@austinabell Circulate it to you for a third version.

This version no longer modifies how FdWrite works, it also reverts all the changes to WordWrite. All the changes are now implemented in and only in Serializer and Deserializer.

We want to restrict ourselves to serialization. The problem that we are encountering is that there are "buffered" bytes that have not been written to the stream.

First of all, we can split all types in Rust into three groups.

primitive types:
- bool, u8, u16, u32, u64, u128, i8, i16, i32, i64, i128, f32, f64, char, str,
- ()
- unit struct aka "struct Nothing"
- unit enum aka "enum{ A, B, C }" with no underlying variables
"newtype", a virtually pseudo type defined over another type, which is specifically defined for efficiency
- Option, which can be considered as enum { None, Some(T) }
- newtype struct, aka struct(T)
- newtype variant, aka enum { struct1(T1), struct2(T2), ... }
wrapper:
- seq, including vec![], long slice
- tuple aka (T1, T2)
- tuple struct aka struct(T1, T2)
- tuple variant aka enum { struct1(T1, T2), struct2(T3, T4, T5) }
- map
- struct aka struct { a: A, b: B}
- struct variant aka enum { struct1 {a: T1, b:T2}, struct2{a: T3, b: T4, c: T5) }

Note that if the buffered bytes have not been fully written to the stream, it means that the last primitive type being written has to be either bool or u8 (we will just treat bool as u8 in the following). There are no primitive types after it, otherwise it would be written to the stream because the byte handler would be reset.

So, there are only two possibilities of this u8.

the variable to be serialized is directly or indirectly inside some sort of a wrapper. By "indirectly", it means that struct { Option<struct(Option<u8>)> } will also be considered as "inside a wrapper".
the variable to be serialized is not inside any wrapper. It could be u8 or Option<struct(Option<u8>)>.

We handle both separately.

For the first case, we introduce a notion of "depth". When serializer enters a wrapper, the depth is increased by one, when it leaves the wrapper, the depth is decreased by one. When the depth is zero, it means that it must have left the last layer of meaningful wrapper, and there are no new bytes possible after it. It needs to be immediately written to the stream when the depth hits zero. The implementation looks like this:

    #[inline]
    fn decrease_depth<W: WordWrite>(&mut self, stream: &mut W) -> Result<()> {
        self.depth -= 1;
        if self.depth == 0 && self.status != 0 {
            stream.write_words(&[self.byte_holder])?;
            self.status = 0;
        }
        Ok(())
    }

For the second case, we notice that the last u8 is in depth 0, and therefore, this u8 is put into the buffer and immediately being written down into the stream, i.e., treating u8 as u32.

     fn handle<W: WordWrite>(&mut self, stream: &mut W, v: u8) -> Result<()> {
        if self.depth == 0 {
            stream.write_words(&[v as u32])?;
        } else {
            ......
        }
        Ok(())
    }

Let me know what you think. In addition, @austinabell can you approve the GitHub actions? So I may get CI to help finding bugs. I am fighting against the error "incompatible client version: 0.21.0-alpha.1" locally now.

flaub · 2024-01-09T18:39:39Z

I'll need to review before we run on CI, thanks.

flaub · 2024-01-09T18:41:59Z

If you want to test locally, you'll need to install the server from local source. You can do that with:

cargo install --path risc0/cargo-risczero

weikengchen · 2024-01-09T18:42:57Z

Just finished a local testing. The local test passes.

austinabell · 2024-01-09T19:12:24Z

@austinabell Circulate it to you for a third version.

So for context, I'm not a stakeholder for these changes, I was just reviewing to help improve the proposal since I had thought through similar patterns and personally couldn't rationalize proposing to change the default. Haven't checked the semantics in depth, and seems like it's a reasonable implementation, but I'm still unsure about these two mainly:

Implementing specific semantics into serialization would be footguns and breaking changes for existing uses
- Also don't have high confidence this would pass if (de)serialization for all types was fuzzed, previous bugs were not caught by these types of unit tests
- (There still is a minor race condition if you are calling internal functions of the deserialize with multiple readers/writers it seems, I haven't dug too much into this though)
Unclear the cycle overhead for this logic
- Yes, it's very minimal, but it's changing the default for users to optimize bad patterns, not correct ones

weikengchen · 2024-01-09T23:37:34Z

The first point from Austin related to breaking of existing implementations. There are two thoughts here.

since the library still has not had a stable release, it is still okay to change the serialization
but we then need to be very careful because if a bug is discovered in the stable release and we need to revert this algorithm, it would be troublesome
the problem that we encounter is for a host to work with an old ELF. This is important because we can already see that this change affects the ELF. If the ELF is hardcoded in a larger system, then the programmer might have no way to update the guest ELF to the latest API
ideally, there would be, in the future, and maybe after the first stable release, a way for the executor or the prover to know what version the guest ELF is compiled on and approach that guest differently. This is reasonable since the guest itself already needs to contain some zkvm sdk glue code and it has to do with versions. This can enable the Bonsai cloud prover to know the version of the guest and support an older version of the VM if Bonsai finds that minimal adapter is needed

weikengchen · 2024-01-09T23:41:19Z

Regarding cycle count, I have a separate question. At this moment, every time a call to read or write is made, a syscall to sys_read or sys_write would happen. This is where I am more concerned on the cycle count.

Would it be more performant if a user tries to wrap it with, for example, BufReader and BufWriter? I just feel that the current implementation does make syscalls very frequently.

nategraf · 2024-01-31T20:04:55Z

Reading through this, it seems the implementation is reasonable, and I can follow what the resulting codec looks like (bytes get packed, with padding whenever we transition form bytes to non-bytes). As a result, we should expect a 75% reduction in the number of calls to read a word from the host.

I went ahead and ran this on a couple of our examples to get samples. With the sha example with 50kB of data, I saw no change to performance as it is written, passing a String to the guest. This is not unexpected. I changed the example to pass in &[u8] and deserialize as Vec<u8>. This PR resulted in a 54% improvement, essentially all of which is accounted for by a reduction in the number of calls to FdReader::read_words.

I also went ahead and compared this to an implementation using end::stdin().read_to_end(&mut data). That implementation achieved a 90% reduction in cycles over the baseline of main. Although it is a significanty different method, avoiding serde all together.

Overall, I'm a bit reluctant to merge this approach. It feels a bit "stapled-on" to core the (de)serializer implementations and optimizes the single case of serializing u8, without e.g. addressing strings. This is clearly valuable for byte buffers, which are of key importance, but there exist other methods of serializing byte-buffers that are more efficient. What does seem to have over e.g. std::fs::Read is that it will work on any byte buffers that are arbitrarily nested in stucts without special logic.

All together, it feels like there is still significant work to be done, so the concern is that we will latter need to come back to this and rebuild the serializer a second time. Both of these transitions introduce some pain for developers, so ideally we'd do it once. On the other hand, we are pre-1.0.

One direction we still need to explore is using alternative, off-the-shelf, codecs. We need to explore e.g. changing out our codec for bincode or another serde codec. Going beyond that, something like rkyv is very interesting to completely avoid direct deserialization costs (especially in terms of copying memory) but would probably not be a good default due to impact on devex. Some context on the codec as it exists today, is that it was created for a world where reading data into the zkVM was zero-cost, since initial reads to a special non-deterministic region of memory were unconstrained, so the host would simply write data to that location without consuming any cycles.

@flaub do you have any opinions or intuitions on this given this information?

Profiles for this branch (e8d40279faf2928a6b29d37671e3a48c2f98336b), main (8650ac9) and a version of the example using std::io::Read::read_to_end are provided in the tarball below.

sha_profiles.tar.gz

weikengchen · 2024-01-31T23:26:52Z

Here is a question: is the 90% reduction on the number of calls to "read" related to the ability to read many words at the same time because it can forecast how many needs to be read?

If so, this calls for the following thing to be implemented in the guest

"BufReader"

So if there are available words in the file descriptor, let it read more.

But a sequential question is: how big is the overhead of read_words? Is the serde cycle mostly due to the inherent machinery of serde? Note that serde cycle may, for a lot of applications, not the bottleneck, but it helps with users writing the code

One solution to help with stability is to allow customized serializer and deserializer. Or implement this as a third-party dependency that wraps the existing implementation. This could be implemented in a way that is transparent and does not require modifications to RISC Zero itself, basically an env2::, or #[features(experimental_serde)]

weikengchen · 2024-01-31T23:29:15Z

Related to the stable release, it is about whether we want to move forward with this new spec of the input where continuous bytes are packed.

Improving the performance could be sequential non-breaking changes

weikengchen · 2024-02-11T10:42:39Z

In the meantime, I uploaded a crate that converts a data struct to and back from u32 slices. (This would avoid the compatibility issue).
l2r0-small-serde

weikengchen added 7 commits January 9, 2024 11:16

propose the alternative serde

7911636

cleanup and test

26b868b

comment

3eaab1f

rename

ec55da8

var name

0131bdf

buffered word

0f9955e

use struct for clarity

960d88c

vercel bot deployed to Preview January 9, 2024 07:31 View deployment

add edge case test

8ffb675

vercel bot deployed to Preview January 9, 2024 07:40 View deployment

write_padded_bytes also needs to clean the buffer

a226885

vercel bot deployed to Preview January 9, 2024 11:45 View deployment

weikengchen marked this pull request as draft January 9, 2024 11:47

add additional interface on buffered word

235336f

weikengchen marked this pull request as ready for review January 9, 2024 12:42

vercel bot deployed to Preview January 9, 2024 12:43 View deployment

austinabell reviewed Jan 9, 2024

View reviewed changes

depth detector

1a540dc

vercel bot deployed to Preview January 9, 2024 17:09 View deployment

weikengchen requested a review from austinabell January 9, 2024 17:46

flaub self-requested a review January 9, 2024 19:16

weikengchen added 2 commits January 10, 2024 07:45

Merge branch 'main' into alter-serde

e0d617a

Merge branch 'main' into alter-serde

482027d

vercel bot deployed to Preview January 9, 2024 23:49 View deployment

Merge branch 'main' into alter-serde

720a33f

vercel bot deployed to Preview January 11, 2024 08:19 View deployment

Merge branch 'main' into alter-serde

03648a6

vercel bot deployed to Preview January 12, 2024 06:08 View deployment

weikengchen closed this Jan 15, 2024

weikengchen deleted the alter-serde branch January 15, 2024 08:32

weikengchen restored the alter-serde branch January 15, 2024 08:41

weikengchen reopened this Jan 15, 2024

flaub added this to the 0.21.0 milestone Jan 23, 2024

Merge branch 'main' into alter-serde

4b49569

vercel bot deployed to Preview February 11, 2024 04:21 View deployment

flaub modified the milestones: 0.21.0, 0.22.0 Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative serialization implementation that compactly stores bytes #1303

Alternative serialization implementation that compactly stores bytes #1303

weikengchen commented Jan 9, 2024 •

edited

vercel bot commented Jan 9, 2024 •

edited

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024 •

edited

austinabell left a comment

austinabell Jan 9, 2024

weikengchen Jan 9, 2024

austinabell commented Jan 9, 2024 •

edited

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024 •

edited

flaub commented Jan 9, 2024

flaub commented Jan 9, 2024

weikengchen commented Jan 9, 2024

austinabell commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

nategraf commented Jan 31, 2024

weikengchen commented Jan 31, 2024 •

edited

weikengchen commented Jan 31, 2024

weikengchen commented Feb 11, 2024

Alternative serialization implementation that compactly stores bytes #1303

Are you sure you want to change the base?

Alternative serialization implementation that compactly stores bytes #1303

Conversation

weikengchen commented Jan 9, 2024 • edited

vercel bot commented Jan 9, 2024 • edited

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024 • edited

austinabell left a comment

Choose a reason for hiding this comment

austinabell Jan 9, 2024

Choose a reason for hiding this comment

weikengchen Jan 9, 2024

Choose a reason for hiding this comment

austinabell commented Jan 9, 2024 • edited

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024 • edited

flaub commented Jan 9, 2024

flaub commented Jan 9, 2024

weikengchen commented Jan 9, 2024

austinabell commented Jan 9, 2024

weikengchen commented Jan 9, 2024

weikengchen commented Jan 9, 2024

nategraf commented Jan 31, 2024

weikengchen commented Jan 31, 2024 • edited

weikengchen commented Jan 31, 2024

weikengchen commented Feb 11, 2024

weikengchen commented Jan 9, 2024 •

edited

vercel bot commented Jan 9, 2024 •

edited

weikengchen commented Jan 9, 2024 •

edited

austinabell commented Jan 9, 2024 •

edited

weikengchen commented Jan 9, 2024 •

edited

weikengchen commented Jan 31, 2024 •

edited