Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZMAError("Expected unpacked size of 149198 but decompressed to 483334")' #11

Open
David-OConnor opened this issue Sep 9, 2019 · 8 comments

Comments

@David-OConnor
Copy link

Any ideas on what could cause this? Code:

    let mut f = io::BufReader::new(fs::File::open(archive_path).unwrap());
    let mut tar: Vec<u8> = Vec::new();
    lzma_rs::xz_decompress(&mut f, &mut tar).unwrap();
@gendx
Copy link
Owner

gendx commented Sep 20, 2019

An LZMA stream can include an unpacked_size hint in its header (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L61-L74), which the code then verifies to reject inconsistencies (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320).

Additionally, the LZMA2 format is a wrapper around LZMA, which can also provide an unpacked size hint on top of it (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L89-L95 and https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L161).

On top of that, XZ compresses each file with an LZMA2 stream.

So it looks like either your file was corrupted or there is a bug in my code due to a corner case that I didn't see before.

  • Can you comment out the error check (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320) and let me know if decompression works for your file?
  • Do you know which software created this archive?
  • Can you run your code with an environment variable set to RUST_LOG=lzma-rs=info, so that I can get a clearer idea of what is going on?
  • If the file is publicly available (or if you can reproduce the issue on a publicly available file), can you point it to me so that I can debug further?

@gendx gendx added the bug label Dec 10, 2019
@gendx
Copy link
Owner

gendx commented Dec 16, 2019

Would #17 (or a variant of it) work for this use case?

@ibaryshnikov
Copy link
Contributor

@gendx I created a reproduction, will it help?

use std::io::BufReader;
use lzma_rs::decompress::{Options, UnpackedSize};

const DATA: &[u8] = &[
    93, 0, 0, 1, 0, 0, 0, 111, 253, 255, 255, 163, 183, 255, 71, 62, 72, 21, 114, 57, 97, 81, 184,
    146, 40, 230, 143, 221, 66, 251, 179, 253, 113, 133, 36, 209, 157, 136, 6, 166, 184, 144, 144,
    180, 72, 27, 108, 146, 211, 153, 161, 58, 255, 52, 129, 75, 240, 91, 145, 234, 14, 20, 173, 77,
    167, 21, 218, 124, 215, 37, 87, 175, 123, 84, 42, 90, 42, 15, 40, 156, 200, 228, 82, 146, 100,
    78, 137, 120, 145, 121, 117, 60, 144, 172, 178, 50, 13, 116, 246, 17, 195, 181, 90, 136, 248,
    128, 160, 103, 203, 131, 61, 101, 79, 13, 188, 166, 86, 177, 61, 29, 24, 147, 226, 211, 42, 16,
    116, 153, 103, 9, 17, 112, 188, 159, 117, 114, 125, 209, 157, 150, 224, 44, 197, 39, 232, 193,
    190, 15, 0, 4, 130, 28, 84, 73, 91, 189, 120, 8, 69, 78, 165, 182, 187, 252, 105, 241, 61, 199,
    210, 26, 194, 15, 70, 225, 186, 144, 150, 195, 46, 150, 103, 144, 224, 196, 136, 25, 140, 45,
    169, 29, 100, 201, 225, 234, 59, 16, 254, 147, 168, 89, 240, 42, 238, 251, 69, 135, 217, 29,
    243, 218, 10, 172, 191, 192, 95, 186, 36, 117, 158, 138, 110, 8, 207, 141, 154, 9, 159, 181, 3,
    71, 95, 111, 99, 247, 247, 33, 89, 114, 7, 61, 46, 250, 138, 21, 2, 105, 135, 90, 83, 215, 223,
    60, 180, 69, 243, 112, 226, 228, 100, 144, 11, 167, 204, 83, 148, 112, 122, 31, 30, 71, 230,
    64, 211, 22, 193, 147, 121, 76, 180, 3, 79, 198, 164, 40, 176, 206, 62, 34, 200, 114, 9, 81,
    33, 129, 115, 94, 77, 166, 124, 38, 148, 20, 62, 133, 46, 21, 63, 37, 112, 202, 221, 26, 34, 4,
    13, 189, 74, 75, 162, 189, 241, 123, 154, 163, 59, 7, 148, 203, 156, 18, 125, 126, 147, 209,
    158, 105, 231, 27, 203, 191, 132, 50, 146, 226, 22, 201, 251, 40, 255, 101, 201, 255, 75, 201,
    60, 5, 36, 246, 121, 87, 144, 239, 19, 138, 52, 229, 23, 193, 207, 4, 113, 151, 154, 147, 223,
    52, 140, 114, 174, 146, 90, 0, 42, 38, 113, 62, 58, 164, 224, 122, 82, 205, 66, 43, 153, 64,
    134, 64, 140, 123, 119, 237, 154, 159, 175, 94, 254, 119, 160, 234, 217, 50, 124, 84, 137, 204,
    160, 36, 83, 32, 91, 171, 136, 100, 221, 214, 36, 161, 168, 31, 105, 199, 188, 91, 14, 248, 37,
    175, 98, 22, 164, 68, 234, 76, 175, 144, 32, 39, 10, 60, 201, 181, 100, 52, 184, 202, 194, 77,
    159, 147, 177, 98, 172, 139, 31, 185, 230, 46, 171, 105, 55, 106, 24, 254, 236, 255, 110, 189,
    247, 139, 213, 200, 241, 113, 20, 28, 232, 144, 194, 54, 188, 180, 193, 196, 73, 234, 60, 111,
    87, 228, 113, 186, 65, 174, 66, 219, 80, 167, 249, 36, 43, 57, 144, 101, 25, 188, 250, 28, 217,
    2, 203, 195, 217, 6, 52, 125, 206, 106, 211, 148, 190, 119, 126, 34, 100, 117, 218, 183, 135,
    108, 77, 244, 54, 116, 167, 24, 113, 104, 211, 29, 14, 143, 255, 124, 241, 74, 135, 140, 131,
    196, 245, 234, 245, 213, 189, 35, 139, 127, 212, 247, 0,
];
const PACKED_SIZE: u64 = 566;
const UNPACKED_SIZE: u64 = 5048;

fn main() {
    let mut input = BufReader::new(DATA);
    let mut output = vec![];
    let options = Options {
        unpacked_size: UnpackedSize::UseProvided(Some(UNPACKED_SIZE)),
    };
    let result = lzma_rs::lzma_decompress_with_options(&mut input, &mut output, &options);
    println!("The result is {:?}", result);
}

It prints: "Expected unpacked size of 5048 but decompressed to 5046".
Packed size is 566 and 5 additional bytes are props.

@gendx
Copy link
Owner

gendx commented Jan 24, 2020

Thanks @ibaryshnikov for your example.

However, I don't see how it's not behaving as expected. You provide an expected unpacked size of 5048 bytes, but the decompressed output is only 5046 bytes. When I set the expected size to 5046 your example stream decompresses fine.

So to me this works as intended - if the decompressed size doesn't match the expected one you provided, an error should be reported instead of returning any partial and/or potentially corrupted result. If you don't know the expected size, you can use UnpackedSize::ReadFromHeader (the default decoding option) - as long as the stream header provides it - or UnpackedSize::UseProvided(None).

@ibaryshnikov
Copy link
Contributor

@gendx thanks for checking this example. It's a bit tricky to check when the input is ended. We can have one code, and iterate several times over it using different ranges. In my example, the code before the last is 1063818487, and we have two different valid ranges for it, first is 2663792640 and second is 1320009537. Then there's a switch to the last code, which is 0. Again, we can iterate over this code using different ranges. After removing the break on

pub fn is_finished_ok(&mut self) -> io::Result<bool> {
    Ok(self.code == 0 && util::is_eof(self.stream)?)
}

I got three ranges for code 0: 2212886016, 1089365498 and 547851036 (before there was only 2212886016). That's how we can find the last two bytes, and have 5048 in total. I've compared the results with the library from another language and it seems correct.

I don't think it's related to the original issue where the difference between unpacked size is quite solid (149198 vs 483334), but It may be a separate issue. @gendx what do you think?

@dragly
Copy link
Contributor

dragly commented Feb 4, 2020

We are seeing the same issue, although with a very tiny difference:

LZMAError Expected unpacked size of 116412 but decompressed to 116411"

Unfortunately, it is again in a file that I cannot share. I have also so far been unable to reproduce the issue with other files.

However, the fix in #26 works for us as well.

dragly added a commit to cognitedata/openctm-rs that referenced this issue Feb 4, 2020
larsmoa pushed a commit to cognitedata/openctm-rs that referenced this issue Feb 5, 2020
dragly added a commit to cognitedata/reveal that referenced this issue Feb 5, 2020
This fixes the error

    Expected unpacked size of x but decompressed to y

gendx/lzma-rs#11
larsmoa pushed a commit to cognitedata/reveal that referenced this issue Feb 5, 2020
This fixes the error

    Expected unpacked size of x but decompressed to y

gendx/lzma-rs#11
@dragly
Copy link
Contributor

dragly commented Mar 4, 2020

As mentioned in #26 (comment) the issue can also be reproduced by compressing the tests/files/range-coder-edge-case file with options set to write the unpacked size to the header and then decompress it:

use lzma_rs;
use std::io::prelude::*;

fn main() {
    let mut x = Vec::new();
    std::fs::File::open("tests/files/range-coder-edge-case")
        .unwrap()
        .read_to_end(&mut x)
        .unwrap();

    let encode_options = lzma_rs::compress::Options {
        unpacked_size: lzma_rs::compress::UnpackedSize::WriteToHeader(Some(x.len() as u64)),
    };
    let decode_options = lzma_rs::decompress::Options {
        unpacked_size: lzma_rs::decompress::UnpackedSize::ReadFromHeader,
    };
    let mut compressed: Vec<u8> = Vec::new();
    lzma_rs::lzma_compress_with_options(
        &mut std::io::BufReader::new(x.as_slice()),
        &mut compressed,
        &encode_options,
    )
    .unwrap();
    let mut bf = std::io::BufReader::new(compressed.as_slice());
    let mut decomp: Vec<u8> = Vec::new();
    lzma_rs::lzma_decompress_with_options(&mut bf, &mut decomp, &decode_options).unwrap();
}

bors bot added a commit that referenced this issue Mar 5, 2020
26: don't check for EOF when unpacked size is specified r=gendx a=ibaryshnikov

### Pull Request Overview

This pull request fixes the case where the last byte is 0 with known unpacked size.
Related issue is #11, in particular #11 (comment)


### Testing Strategy

This pull request was tested by...

- [x] Added relevant unit tests.
- [ ] Added relevant end-to-end tests (such as `.lzma`, `.lzma2`, `.xz` files).


### Supporting Documentation and References

The best reference I was able to find is
https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt

```
Uncompressed Size is stored as unsigned 64-bit little endian
integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
that Uncompressed Size is unknown. End of Payload Marker (*)
is used if and only if Uncompressed Size is unknown.
```


### TODO, help wanted

I'm unsure what self.rep[0] is checking here
```rust
if self.rep[0] == 0xFFFF_FFFF {
    if rangecoder.is_finished_ok()? {
        break;
    }
    return Err(error::Error::LZMAError(String::from(
        "Found end-of-stream marker but more bytes are available",
    )));
}
```

Co-authored-by: ibaryshnikov <baryshnikov.il@gmail.com>
@antonsmetanin
Copy link

I'm having the same issue with this file:
http://beta.unity3d.com/download/d691e07d38ef/LinuxEditorInstaller/Unity.tar.xz

fn main() {
    let mut file = std::io::BufReader::new(std::fs::File::open("Unity.tar.xz").unwrap());
    let mut decomp: Vec<u8> = Vec::new();

    lzma_rs::xz_decompress(&mut file, &mut decomp).unwrap();
}

This code produces the following error:

ZMAError("Expected unpacked size of 153357 but decompressed to 779954")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants