Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for skippable frames #185

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

antoyo
Copy link

@antoyo antoyo commented Jan 19, 2023

Hi.
This add supports for writing and reading skippable frames as well as skipping frames.

There is no support for doing that in a streaming way in the ZSTD C library, so that's why so part of this PR parses some stuff manually.

There are a few TODOs in the code that I'd like you to tell what you think we should do with that.

Thanks for the review.

@antoyo antoyo force-pushed the feature/skippable-frames branch 2 times, most recently from 511e7a7 to b0e5811 Compare January 26, 2023 20:02
Copy link
Owner

@gyscos gyscos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for the PR!

Let's start with zstd-safe. I think it would be best to make these free functions, since they have nothing to do with contexts (contexts are not the only way to encode/decode zstd data).

Otherwise, with a few minor details, it looks nice!

buffer,
capacity,
magic_variant,
input.as_ptr() as *mut _,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't turn non-mut &[u8] into *mut.

Suggested change
input.as_ptr() as *mut _,
ptr_void(input),

#[cfg(feature = "experimental")]
pub fn is_skippable_frame(input: &[u8]) -> SafeResult {
unsafe {
parse_code(zstd_sys::ZSTD_isSkippableFrame(input.as_ptr() as *mut _, input.len()) as usize)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for ZSTD_isSkippableFrame does not indicate that it can return an error, and indeed in the source code we can see it just returns either 0 or 1.

So let's just return a bool from the function.

Suggested change
parse_code(zstd_sys::ZSTD_isSkippableFrame(input.as_ptr() as *mut _, input.len()) as usize)
zstd_sys::ZSTD_isSkippableFrame(ptr_void(input), input.len() > 0

Comment on lines 447 to 462
#[cfg(feature = "experimental")]
// TODO: use InBuffer?
pub fn write_skippable_frame<C: WriteBuf + ?Sized>(output: &mut OutBuffer<'_, C>, input: &[u8], magic_variant: u32) -> SafeResult {
let input_len = input.len();
unsafe {
output.dst.write_from(|buffer, capacity| {
parse_code(zstd_sys::ZSTD_writeSkippableFrame(
buffer,
capacity,
input.as_ptr() as *mut _,
input_len,
magic_variant,
))
})
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for an InBuffer here, or even really an OutBuffer, since the zstd lib either writes everything in one go (in which case no need to maintain an input cursor) or returns an error.

Suggested change
#[cfg(feature = "experimental")]
// TODO: use InBuffer?
pub fn write_skippable_frame<C: WriteBuf + ?Sized>(output: &mut OutBuffer<'_, C>, input: &[u8], magic_variant: u32) -> SafeResult {
let input_len = input.len();
unsafe {
output.dst.write_from(|buffer, capacity| {
parse_code(zstd_sys::ZSTD_writeSkippableFrame(
buffer,
capacity,
input.as_ptr() as *mut _,
input_len,
magic_variant,
))
})
}
}
pub fn write_skippable_frame<C: WriteBuf + ?Sized>(dst: &mut C, input: &[u8], magic_variant: u32) -> SafeResult {
let input_len = input.len();
unsafe {
dst.write_from(|buffer, capacity| {
parse_code(zstd_sys::ZSTD_writeSkippableFrame(
buffer,
capacity,
ptr_void(input),
input_len,
magic_variant,
))
})
}
}

Comment on lines 924 to 927
pub fn read_skippable_frame<C: WriteBuf + ?Sized>(output: &mut OutBuffer<'_, C>, magic_variant: &mut u32, input: &[u8]) -> SafeResult {
let input_len = input.len();
unsafe {
output.dst.write_from(|buffer, capacity| {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too no need for an OutBuffer:

Suggested change
pub fn read_skippable_frame<C: WriteBuf + ?Sized>(output: &mut OutBuffer<'_, C>, magic_variant: &mut u32, input: &[u8]) -> SafeResult {
let input_len = input.len();
unsafe {
output.dst.write_from(|buffer, capacity| {
pub fn read_skippable_frame<C: WriteBuf + ?Sized>(dst: &mut C, magic_variant: &mut u32, input: &[u8]) -> SafeResult {
let input_len = input.len();
unsafe {
dst.write_from(|buffer, capacity| {

@gyscos
Copy link
Owner

gyscos commented Jan 30, 2023

On the higher-level zstd-rs side, it seems that there's 2 main things:

  • Some wrappers for the zstd-safe functions.
  • Some re-implementation of read/write skippable frames in a slightly more "streamable" way - no need to buffer the entire frame twice.

The second point in particular is where most changes are required to re-implement quite a bit of logic from the zstd library. I feel like there's quite a lot of added code and complexity, maybe without being worth the added convenience.

In particular, regarding the re-implementation of read_skippable_frame:

  • Seeking back when an error occurs is a big source of copied code, either from the standard library or from the C zstd library.
  • The implementation can stream the Read source, but still buffers the entire result.

I would be more comfortable with a solution that:

  • Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.
  • Does not buffer the output, but either provide a Read that will only read the skippable content (using .take(content_size) on the underlying reader) or stream the frame content to a Write given as parameter. This way, users could buffer it if needed.

@antoyo
Copy link
Author

antoyo commented Jan 30, 2023

  • Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.

If you do not seek back, how can you detect that there is a skippable frame?
is_skippable_frame() requires a &[u8], so it's not streaming.

And an attempt to call read_skippable_frame() and checking for an error could leave the stream at a wrong position without being able to continue reading normal frames.

@gyscos
Copy link
Owner

gyscos commented Jan 30, 2023

is_skippable_frame only needs the &[u8] to cover the header (in practice the first 4 bytes), so it's still mostly streaming.

And if something happens in read_skippable_frame, then something is wrong, and there's no guarantee the stream is valid anyway.

@antoyo
Copy link
Author

antoyo commented Jan 30, 2023

is_skippable_frame only needs the &[u8] to cover the header (in practice the first 8 bytes), so it's still mostly streaming.

But how would you combine this function with reading frames in a streaming fashion without seeking back?

If you read the first 8 bytes (which could be too much bytes IIRC, e.g. I think you can have a frame of 5 bytes) to check if it's a skippable frame, how would you then read the frame (whether it is skippable or not)?

@gyscos
Copy link
Owner

gyscos commented Jan 30, 2023

We could rely on the BufRead own buffer to "peek" at the first 4 bytes. Though it's true there's no guarantee the BufRead will actually fill more than 1 byte. If we do want to support that, one alternative would be to store up to 3 bytes of buffered data in a new zstd::stream::zio::reader::State::Peeked, which would be read from before moving back to State::Reading - I suspect it might be simpler than the current situation.

Maybe we could go a different way and register a skippable_frame_handler: Option<Box<dyn FnMut(u32, &mut dyn Reader) -> io::Result<()>>> callback, which will be called in the regular operations if a skippable frame is found.

@antoyo-light
Copy link

Ok, so I just pushed with all the changes you asked.

@cgbur
Copy link
Contributor

cgbur commented Mar 17, 2023

Closes #13 if merged

@antoyo
Copy link
Author

antoyo commented Apr 4, 2023

Ping.

@antoyo
Copy link
Author

antoyo commented May 30, 2023

Any update on this?

@gyscos
Copy link
Owner

gyscos commented Jun 2, 2023

Hi - sorry, have had very little time to look more into this.

@antoyo
Copy link
Author

antoyo commented Aug 31, 2023

Hi.
Any update?

@ariel-miculas
Copy link
Contributor

ariel-miculas commented Nov 7, 2023

I'm also interested in this feature.
I wonder if seekable compression could be implemented in this crate, so I don't have to use https://crates.io/crates/zstd-seekable

#[cfg(feature = "experimental")]
pub fn read_skippable_frame<C: WriteBuf + ?Sized>(dst: &mut C, magic_variant: &mut u32, input: &[u8]) -> SafeResult {
let input_len = input.len();
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding SAFETY comments for the unsafe blocks and wrapping the minimum amount of code in the unsafe block. See https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks

@@ -113,12 +153,17 @@ where
loop {
match self.state {
State::Reading => {
let is_peeking = self.peeking;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should call the peeking() function here.

let (bytes_read, bytes_written) = {
// Start with a fresh pool of un-processed data.
// This is the only line that can return an interruption error.
let input = if first {
// eprintln!("First run, no input coming.");
b""
} else if self.peeking {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either check the is_peeking variable or call the peeking() function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants