Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChecksumVerificationFailed on read of many files in solid archive #31

Open
Revertron opened this issue Aug 12, 2023 · 9 comments
Open
Labels
enhancement New feature or request

Comments

@Revertron
Copy link

Revertron commented Aug 12, 2023

I have solid archives with block size of 16Mb. And many of the files fail to read because of ChecksumVerificationFailed.

Example archive: https://up.revertron.com/Memes.7z

Example code:

pub fn test_blocks() {
    let mut buf = Vec::new();

    let mut archive = SevenZReader::open("Memes.7z", Password::empty()).expect("Error opening 7z archive");
    let _ = archive.for_each_entries(|entry, reader| {
        println!("Reading file {}", &entry.name);
        if "FcGD7nuX0AgQNS_.jpg" == entry.name {
            println!("*** Found file {}", &entry.name);
            match reader.read_to_end(&mut buf) {
                Ok(_size) => {
                    println!("Have read file {}", &entry.name);
                    return Ok(false);
                }
                Err(e) => {
                    println!("Error reading file {}: {}", &entry.name, &e);
                    return Err(sevenz_rust::Error::from(e));
                }
            }
        }
        Ok(true)
    });
    assert!(!buf.is_empty())
}
@dyz1990
Copy link
Owner

dyz1990 commented Aug 12, 2023

You can't skip reading these entries, even if you don't need them.
Try this code:


pub fn test_blocks() {
    let mut buf = Vec::new();

    let mut archive =
        SevenZReader::open("Memes.7z", Password::empty()).expect("Error opening 7z archive");
    let _ = archive.for_each_entries(|entry, reader| {
        println!("Reading file {}", &entry.name);
        if "FcGD7nuX0AgQNS_.jpg" == entry.name {
            println!("*** Found file {}", &entry.name);
            match reader.read_to_end(&mut buf) {
                Ok(_size) => {
                    println!("Have read file {}", &entry.name);
                    return Ok(false);
                }
                Err(e) => {
                    println!("Error reading file {}: {}", &entry.name, &e);
                    return Err(sevenz_rust::Error::from(e));
                }
            }
        } else {
            // comsume the reader to skip the file, even if we don't need it
            while let Ok(n) = reader.read(&mut [0; 4096]) {
                if n == 0 {
                    break;
                }
            }
            Ok(true)
        }
    });
    assert!(!buf.is_empty())
}

@Revertron
Copy link
Author

Thanks for quick response!
This works, but it is very slow, even if I make buffer 2Mb and move it from closure and reuse it.

Is there something to make it faster? :(

@Revertron
Copy link
Author

Gone through the code of reader, and I think we need to change all those R: Read to Read + Seek, and then just skip unread bytes.
But there is a problem with multiple traits: https://doc.rust-lang.org/error_codes/E0225.html
So, we need to create a different trait like this:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=66c772a420cb50c0fa78ab3d91bda052

@dyz1990
Copy link
Owner

dyz1990 commented Aug 14, 2023

@Revertron Because the data to be decompressed depends on the data in front of it, you cannot simply skip the previous data and only decompress the data in the back. This is why the reader does not implement the Seek trait.

@Revertron
Copy link
Author

But the 7zip app is definitely skipping all blocks before the block of extracting file. Is it possible to implement this?

@dyz1990
Copy link
Owner

dyz1990 commented Aug 15, 2023

But the 7zip app is definitely skipping all blocks before the block of extracting file. Is it possible to implement this?

It's not easy, I'll give it a try

@dyz1990 dyz1990 added the enhancement New feature or request label Aug 15, 2023
@dyz1990
Copy link
Owner

dyz1990 commented Aug 15, 2023

@Revertron I noticed that the file "Memes.7z" contains more than one solid stream. So you can speed up decompression by skipping streams that don't contain required files.

you can check this example forder_dec.rs.
And this example mt_decompress.rs if you want use multi-thread.

@pavpen
Copy link

pavpen commented Nov 28, 2023

I think you should, at least, document this issue in the description of for_each_entries, and related functions. I spent a day debugging my code to end up here.

@dyz1990
Copy link
Owner

dyz1990 commented Dec 5, 2023

@pavpen Sorry about that. I'll add documentation for the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants