Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize VarInt decoding #1117

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Matthias247
Copy link
Contributor

Upfront: This isn't the most elegant change. But given VarInts sit in the
hot path it seems worth it.

The current VarInt decoding performs some duplicate bounds check.
VarInt::decode checks for available bytes in order to return an error.
And bytes::Buf performs additional bounds checks in various helper methods.

By using Buf::chunk and Buf::advance directly we can save some of those.
We could improve even more by only support decoding from contiguous buffers,
but that would change the API contract.

In addition it seemed like the u16 be->le conversion wasn't as free
as it should be. A 2byte LE buffer improves things here.

Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.

Upfront: This isn't the most elegant change. But given VarInts sit in the
hot path it seems worth it.

The current VarInt decoding performs some duplicate bounds check.
`VarInt::decode` checks for available bytes in order to return an error.
And `bytes::Buf` performs additional bounds checks in various helper methods.

By using `Buf::chunk` and `Buf::advance` directly we can save some of those.
We could improve even more by only support decoding from contiguous buffers,
but that would change the API contract.

In addition it seemed like the `u16` be->le conversion wasn't as free
as it should be. A 2byte LE buffer improves things here.

Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.
@Matthias247
Copy link
Contributor Author

For anyone who is more curious. Here is a baseline flamegraph. "Matched" is the amount of samples in VarInt::decode (2.6%).
Inside the method we can also see 0.68% time spent in swap_bytes.

grafik

Here is the changed version, with 1.2% of samples in VarInt::decode:

grafik

Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a microbenchmark for this path?

// check. We already performed this one earlier on.
buf[0] = r.chunk()[0];
r.advance(1);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this specific change actually help? I wouldn't expect this to have any difference in bounds checking semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an assert! in the get_u8, not just a rust array bounds check. I don't think those are removed. And since even without the "le" change this proved to be faster, I guess practical experience shows that it helps.

@Ralith Ralith requested a review from djc May 17, 2021 00:48
Copy link
Collaborator

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, left this sitting around for a long time. @Matthias247 what do you think?

@@ -159,21 +163,35 @@ impl Codec for VarInt {
if r.remaining() < 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this without unsafe if we use <&[u8; 1]>::try_from() to extract an array out of the bounds check?

Along these lines:

diff --git a/quinn-proto/src/varint.rs b/quinn-proto/src/varint.rs
index 7f57bc3c..8f4b9b7c 100644
--- a/quinn-proto/src/varint.rs
+++ b/quinn-proto/src/varint.rs
@@ -1,4 +1,4 @@
-use std::{convert::TryInto, fmt};
+use std::{convert::{TryInto, TryFrom}, fmt};
 
 use bytes::{Buf, BufMut};
 use thiserror::Error;
@@ -146,38 +146,37 @@ pub struct VarIntBoundsExceeded;
 
 impl Codec for VarInt {
     fn decode<B: Buf>(r: &mut B) -> coding::Result<Self> {
-        if !r.has_remaining() {
-            return Err(UnexpectedEnd);
-        }
         let mut buf = [0; 8];
-        buf[0] = r.get_u8();
+        let input = r.chunk();
+        buf[0] = match input.get(0) {
+            Some(&b) => b,
+            None => return Err(UnexpectedEnd),
+        };
+
         let tag = buf[0] >> 6;
         buf[0] &= 0b0011_1111;
-        let x = match tag {
-            0b00 => u64::from(buf[0]),
+        let (x, len) = match tag {
+            0b00 => (u64::from(buf[0]), 1),
             0b01 => {
-                if r.remaining() < 1 {
-                    return Err(UnexpectedEnd);
-                }
-                r.copy_to_slice(&mut buf[1..2]);
-                u64::from(u16::from_be_bytes(buf[..2].try_into().unwrap()))
+                let arr = [0, buf[0]];
+                let read = <&[u8; 1]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+                let arr = [read[0], buf[0]];
+                (u64::from(u16::from_le_bytes(arr)), 2)
             }
             0b10 => {
-                if r.remaining() < 3 {
-                    return Err(UnexpectedEnd);
-                }
-                r.copy_to_slice(&mut buf[1..4]);
-                u64::from(u32::from_be_bytes(buf[..4].try_into().unwrap()))
+                let read = <&[u8; 3]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+                let arr = [read[2], read[1], read[0], buf[0]];
+                (u64::from(u32::from_le_bytes(arr)), 4)
             }
             0b11 => {
-                if r.remaining() < 7 {
-                    return Err(UnexpectedEnd);
-                }
-                r.copy_to_slice(&mut buf[1..8]);
-                u64::from_be_bytes(buf)
+                let read = <&[u8; 7]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+                let arr = [read[6], read[5], read[4], read[3], read[2], read[1], read[0], buf[0]];
+                (u64::from(u64::from_le_bytes(arr)), 8)
             }
             _ => unreachable!(),
         };
+
+        r.advance(len);
         Ok(VarInt(x))
     }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants