Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

Open
Christian-Sander opened this issue Nov 17, 2020 · 6 comments
Labels

Comments

@Christian-Sander
Copy link

Christian-Sander commented Nov 17, 2020

To decompress a gzip compressed file I had to provide a buffer that is one byte larger than indicated by the size field in the trailer of the file. Otherwise the library would return TINF_OK instead of TINF_DONE. The decompressed file size matched the trailer size field and the last byte was basically unneeded as far as I can tell. My code, including the size calculation, is based on the tgunzip code, using a chunk size of 1.

I have not looked into the source code to trace it, just wanted to let you know.

@pfalcon
Copy link
Owner

pfalcon commented Nov 17, 2020

Thanks for the report! Please see https://www.chiark.greenend.org.uk/~sgtatham/bugs.html, in particular that part where you need to provide specific steps/code/data to reproduce the issue.

@Christian-Sander
Copy link
Author

Christian-Sander commented Nov 17, 2020

I'm working on an embedded project so I can't really give you the code used to reproduce the issue as it relies on the hardware.

Nontheless, here's the function I used for decompression:

bool uncompress_gzip(const uint8_t* source, size_t source_len, uint8_t* dest, size_t& dest_len)
{
    //Decompressed size is stored as little endian uint32_t in the last four bytes of the compressed file
    size_t decompressed_len =                 source[source_len - 1];
    decompressed_len = 256*decompressed_len + source[source_len - 2];
    decompressed_len = 256*decompressed_len + source[source_len - 3];
    decompressed_len = 256*decompressed_len + source[source_len - 4];

    if(decompressed_len > dest_len)
        return false;

    struct uzlib_uncomp d;
    uzlib_uncompress_init(&d, NULL, 0);

    /* all 3 fields below must be initialized by user */
    d.source = source;
    d.source_limit = source + source_len - 4; // Remove size trailer from source buffer
    d.source_read_cb = NULL;

    int res = uzlib_gzip_parse_header(&d);
    if (res != TINF_OK) {
        log_error("Error parsing header: %d\n", res);
        return false;
    }

    d.dest_start = d.dest = dest;

    while (dest_len) {
        unsigned int chunk_len = dest_len < 1 ? dest_len : 1;
        d.dest_limit = d.dest + chunk_len;
        res = uzlib_uncompress_chksum(&d);
        dest_len -= chunk_len;
        if (res != TINF_OK) {
            break;
        }
    }

    if (res != TINF_DONE) {
        log_error("Error during decompression: %d\n", res);
        return false;
    }

    dest_len = d.dest - dest;
    log_info("decompressed %lu b\n", dest_len);
    return true;
}

I need to provide a buffer which is 1 byte larger than the expected decompressed output. When I find some time I'll try to come up with a minimum working example with a file I can release publically. The library code seems to imply that not all files might be affected since there are different code paths that can return TINF_OK.

@pfalcon
Copy link
Owner

pfalcon commented Nov 24, 2020

Sorry for the delay here, busy with other projects.

I'm working on an embedded project so I can't really give you the code used to reproduce the issue as it relies on the hardware.

If the issue depends on the hardware, then this ticket is invalid ;-). But you said it depends on uzlib?

Thanks for posting some code. But most likely, the issue would need to be debugged on the existing known working code, like the included "tgunzip" tool. So, someone would need to prepare a patch for it, with associated steps on how to see the issue. (Indeed, one of the good questions is why nobody noticed this issue before.)

I have some concept of what might happen - the check for dest buffer exhaustion likely happens after writing a byte to it. Once that happens, TINF_OK is returned. Without consideration that the next compressed symbol could be EOF and TINF_DONE could be returned instead. The fix would be to test for buffer exhaustion before a byte is written. And that could be a very simple or rather tangled fix, with likely median being that it's moderately unbeautiful and grows the code size (in the project whose whole purpose is to be as small in the code size as possible.) So, viability of fix vs workaround(s) should be considered.

@Christian-Sander
Copy link
Author

I don't think that the issue depends on the hardware, only my currently existing code for reproducing it does.
There might also be a possible dependence on the data that is meant to be decompressed as there are multiple paths in the decompression algorithm that depend on the data.

Unfortunately I haven't yet been able to work on a standalone example (need to do this in my free time). With some luck I'll manage to on the next weekend.

In my mind this isn't much of a problem, but it should atleast be documented in the function description so users of the library can account for it without having to debug.

@Christian-Sander
Copy link
Author

I think I can resolve this now. I initially based my code on your dev branch, wrongly thinking it was more recent. In the master branch you actually have a dlen++ line in tgunzip with an explaining comment which is missing on the dev branch. Because of this I provided a buffer matching the expected output size. Sorry again for the confusion, but maybe you could add some explanation to the function documentation to point towards this behaviour to avoid such misunderstandings in the future?

@pfalcon
Copy link
Owner

pfalcon commented Nov 28, 2020

Good to know that someone (like me) hit something similar before ;-). I however agree that something like "you need to provide longer buffer than needed to finish decompression in one go" sounds somewhat unfortunate. While the workaround may be "be prepared to call uzlib_uncompress*() more than once", I'd like first to be sure there's no other easy solution. So, I'd prefer to keep this ticket open. Thanks.

@pfalcon pfalcon reopened this Nov 28, 2020
@pfalcon pfalcon changed the title Off-by-one bug in decompression? To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len Apr 8, 2021
@pfalcon pfalcon added the triaged label Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants