Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support streaming read #10

Open
vn971 opened this issue Aug 28, 2019 · 8 comments
Open

support streaming read #10

vn971 opened this issue Aug 28, 2019 · 8 comments

Comments

@vn971
Copy link

vn971 commented Aug 28, 2019

Currently, a blocking function is provided by the library that reads from io::BufRead and writes to io::Write. This enforces the user of the library to read all contents into memory, or into a file.

Sometimes, however, it is only needed to traverse the data, but not have it all at once.

Such a thing could be achieved by having a function that, given io::Read, gives something that implements io::Read as well. This way, you can progressively read compressed or decompressed stream, while the library will internally read the underlying stream. This is how xz2 crate works, for example, see the function signature of xz2::read::XzDecoder::new. This also looks very flexible and intuitive as well: decompressor starts to act like a "pipe" (in unix terminology), rather than something that writes.

Support of it in lzma-rs would be very nice I think.
Personally, I'm raising the issue because I wanted to try this library in rua https://github.com/vn971/rua Here I am using an intermediate layer of decompression for another function that accepts Read https://github.com/vn971/rua/blob/master/src/tar_check.rs#L26 (however, the underlying library xz2 is not pure Rust, but uses bindings)

Thoughts?

@gendx
Copy link
Owner

gendx commented Sep 2, 2019

This is a very interesting point!

I was wondering whether there was a generic way of transforming an io::Write into an io::Read. The opposite would be quite simple (read bytes from an io::Read and write them into an io::Write), but this looks trickier. Maybe that could be possible with async functions/generators? Or with a separate process - or simply a thread - that "writes" data to the main thread, which reads it (like with Unix pipes).

In the meantime, I think the easiest way to support streaming would be to extract the loop body of the process function (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L215) into a step function. Then, in the streaming case, use a temporary buffer as the io::Write for the current decoder ; the read method of your io::Read would repeatedly call step and copy the bytes from the tmp buffer into the read buffer.

I probably won't have time to look at it more closely this week, but feel free to send a PR if you want to give it a try!

@vn971
Copy link
Author

vn971 commented Sep 3, 2019

Thanks for the explanation!

Regarding the process function and the temporary buffer -- indeed this is how I thought it can be done as well.

I'm not sure I'll have time in the coming days as well though. Maybe I'll come to that later if/when I get rid of other libraries that bind to OS libraries, and will be otherwise on pure Rust.

@demurgos
Copy link

Hi,
I am maintaining swf-parser, a library to parse SWF files. These files can be encoded with LZMA and I am using this library to decode them. To support streaming parsing of SWF files, support in LZMA is required first. A low level API similar to the one used by the inflate crate would be nice.
Using this API, you create a stream inflater maintaining the internal state of parser (for LZMA it would correspond to dictionaries and temporary buffers). You can manually feed data to the decoder it and read the result.

@cccs-sadugas
Copy link
Contributor

I've been working on an implementation for this ticket based off of the LzmaDec_TryDummy function in libhtp's port of the LZMA SDK. The main issue in incrementally executing the loop is that you may end up in a partially corrupted state if you are in the middle of a function and you fail to read the next byte because it isn't available yet.

Also, I used the std::io::Write trait instead of std::io::Read to create an interface like flate2::write::DeflateDecoder.

I'll publish this soon. It will most likely be dependant on #50 .

@gendx
Copy link
Owner

gendx commented Jul 13, 2020

I'm now wondering whether integrating with async/await would be the way to go to implement this. Something like taking futures::io::AsyncRead as input and writing to a futures::io::AsyncWrite or a futures::stream::Stream of bytes as output.

I don't know what the performance overhead of that would be, but from a programming perspective the code should be similar to the current one (with some extra async keywords). The streaming mode would be gated by a feature flag.

@cccs-sadugas cccs-sadugas mentioned this issue Jul 13, 2020
2 tasks
@cccs-sadugas
Copy link
Contributor

@gendx I published a PR for this if you want to have a look. I haven't really thought of implementing it using futures but that's an interesting idea. It would add a couple extra dependencies for those who want to use a streaming API and possibly require a runtime. I was looking for a solution that uses an std::io::Write interface to have an API consistent with flate2::write::DeflateDecoder to implement a generic decoder.

@Herschel
Copy link

It'd be useful if a Read interface were also provided (compare flate2 which has both read::DeflateDecoder and write::DeflateDecoder).

@soulmachine
Copy link

Reading line by line is very important, for example, flate2 can read .gz files line by line:

let f_in = std::fs::File::open("sample.txt.xz").unwrap();
let d = flate2::read::GzDecoder::new(f_in);
let mut buf_reader = std::io::BufReader::new(d);
for line in buf_reader.lines() {
    println!("{}", line)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants