Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lzma2_compress and zx_compress do not compress input #99

Open
holly-hacker opened this issue Apr 14, 2023 · 2 comments
Open

lzma2_compress and zx_compress do not compress input #99

holly-hacker opened this issue Apr 14, 2023 · 2 comments

Comments

@holly-hacker
Copy link

holly-hacker commented Apr 14, 2023

I was comparing compression methods to figure out what to choose for my project and noticed some strange results:

postcard: 14880

postcard deflate: 3499
postcard brotli: 2792
postcard lz4: 4821
postcard lzma: 8496
postcard lzma2: 14884
postcard xz: 14932

In this snippet, lzma, lzma2 and xz are done using lzma_compress, lzma2_compress and xz_compress respectively. It seemed weird to me that lzma2 and xz are so poor, so I stepped through the code and noticed that it's not applying compression at all for lzma2 and xz:

/// Compress data with LZMA2 and default [`Options`](compress/struct.Options.html).
pub fn lzma2_compress<R: io::BufRead, W: io::Write>(
    input: &mut R,
    output: &mut W,
) -> io::Result<()> {
    encode::lzma2::encode_stream(input, output)
}

// ...

pub fn encode_stream<R, W>(input: &mut R, output: &mut W) -> io::Result<()>
where
    R: io::BufRead,
    W: io::Write,
{
    let mut buf = vec![0u8; 0x10000];
    loop {
        let n = input.read(&mut buf)?;
        if n == 0 {
            // status = EOF
            output.write_u8(0)?;
            break;
        }

        // status = uncompressed reset dict
        output.write_u8(1)?;
        // unpacked size
        output.write_u16::<BigEndian>((n - 1) as u16)?;
        // contents
        output.write_all(&buf[..n])?;
    }
    Ok(())
}

This sounds like either a bug, or something that should be documented better. As I understand from the readme file, lzmr-rs is only a decoder which is confusing since these compression functions are present.

@gendx
Copy link
Owner

gendx commented Jun 12, 2023

See #9 and the "Encoder" section in the README:

For now, there is also a dumb encoder that only uses byte literals, with many hard-coded constants for code simplicity. Better encoders are welcome!

I can remove mention of "encoder" from the "About" description of this GitHub repo. Don't hesitate to send a PR with further suggestions on how to improve the wording in the README.

@holly-hacker
Copy link
Author

I was reminded of this issue by your recent blogpost. Should I close this in favor of #9, or do you want to keep it open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants