To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

Christian-Sander · 2020-11-17T07:43:21Z

To decompress a gzip compressed file I had to provide a buffer that is one byte larger than indicated by the size field in the trailer of the file. Otherwise the library would return TINF_OK instead of TINF_DONE. The decompressed file size matched the trailer size field and the last byte was basically unneeded as far as I can tell. My code, including the size calculation, is based on the tgunzip code, using a chunk size of 1.

I have not looked into the source code to trace it, just wanted to let you know.

pfalcon · 2020-11-17T07:59:43Z

Thanks for the report! Please see https://www.chiark.greenend.org.uk/~sgtatham/bugs.html, in particular that part where you need to provide specific steps/code/data to reproduce the issue.

Christian-Sander · 2020-11-17T08:28:48Z

I'm working on an embedded project so I can't really give you the code used to reproduce the issue as it relies on the hardware.

Nontheless, here's the function I used for decompression:

bool uncompress_gzip(const uint8_t* source, size_t source_len, uint8_t* dest, size_t& dest_len)
{
    //Decompressed size is stored as little endian uint32_t in the last four bytes of the compressed file
    size_t decompressed_len =                 source[source_len - 1];
    decompressed_len = 256*decompressed_len + source[source_len - 2];
    decompressed_len = 256*decompressed_len + source[source_len - 3];
    decompressed_len = 256*decompressed_len + source[source_len - 4];

    if(decompressed_len > dest_len)
        return false;

    struct uzlib_uncomp d;
    uzlib_uncompress_init(&d, NULL, 0);

    /* all 3 fields below must be initialized by user */
    d.source = source;
    d.source_limit = source + source_len - 4; // Remove size trailer from source buffer
    d.source_read_cb = NULL;

    int res = uzlib_gzip_parse_header(&d);
    if (res != TINF_OK) {
        log_error("Error parsing header: %d\n", res);
        return false;
    }

    d.dest_start = d.dest = dest;

    while (dest_len) {
        unsigned int chunk_len = dest_len < 1 ? dest_len : 1;
        d.dest_limit = d.dest + chunk_len;
        res = uzlib_uncompress_chksum(&d);
        dest_len -= chunk_len;
        if (res != TINF_OK) {
            break;
        }
    }

    if (res != TINF_DONE) {
        log_error("Error during decompression: %d\n", res);
        return false;
    }

    dest_len = d.dest - dest;
    log_info("decompressed %lu b\n", dest_len);
    return true;
}

I need to provide a buffer which is 1 byte larger than the expected decompressed output. When I find some time I'll try to come up with a minimum working example with a file I can release publically. The library code seems to imply that not all files might be affected since there are different code paths that can return TINF_OK.

pfalcon · 2020-11-24T12:03:28Z

Sorry for the delay here, busy with other projects.

I'm working on an embedded project so I can't really give you the code used to reproduce the issue as it relies on the hardware.

If the issue depends on the hardware, then this ticket is invalid ;-). But you said it depends on uzlib?

Thanks for posting some code. But most likely, the issue would need to be debugged on the existing known working code, like the included "tgunzip" tool. So, someone would need to prepare a patch for it, with associated steps on how to see the issue. (Indeed, one of the good questions is why nobody noticed this issue before.)

I have some concept of what might happen - the check for dest buffer exhaustion likely happens after writing a byte to it. Once that happens, TINF_OK is returned. Without consideration that the next compressed symbol could be EOF and TINF_DONE could be returned instead. The fix would be to test for buffer exhaustion before a byte is written. And that could be a very simple or rather tangled fix, with likely median being that it's moderately unbeautiful and grows the code size (in the project whose whole purpose is to be as small in the code size as possible.) So, viability of fix vs workaround(s) should be considered.

Christian-Sander · 2020-11-24T18:34:28Z

I don't think that the issue depends on the hardware, only my currently existing code for reproducing it does.
There might also be a possible dependence on the data that is meant to be decompressed as there are multiple paths in the decompression algorithm that depend on the data.

Unfortunately I haven't yet been able to work on a standalone example (need to do this in my free time). With some luck I'll manage to on the next weekend.

In my mind this isn't much of a problem, but it should atleast be documented in the function description so users of the library can account for it without having to debug.

Christian-Sander · 2020-11-28T08:15:58Z

I think I can resolve this now. I initially based my code on your dev branch, wrongly thinking it was more recent. In the master branch you actually have a dlen++ line in tgunzip with an explaining comment which is missing on the dev branch. Because of this I provided a buffer matching the expected output size. Sorry again for the confusion, but maybe you could add some explanation to the function documentation to point towards this behaviour to avoid such misunderstandings in the future?

pfalcon · 2020-11-28T09:57:33Z

Good to know that someone (like me) hit something similar before ;-). I however agree that something like "you need to provide longer buffer than needed to finish decompression in one go" sounds somewhat unfortunate. While the workaround may be "be prepared to call uzlib_uncompress*() more than once", I'd like first to be sure there's no other easy solution. So, I'd prefer to keep this ticket open. Thanks.

Christian-Sander closed this as completed Nov 28, 2020

pfalcon reopened this Nov 28, 2020

pfalcon changed the title ~~Off-by-one bug in decompression?~~ To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len Apr 8, 2021

pfalcon added the triaged label Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

Christian-Sander commented Nov 17, 2020 •

edited

pfalcon commented Nov 17, 2020

Christian-Sander commented Nov 17, 2020 •

edited

pfalcon commented Nov 24, 2020

Christian-Sander commented Nov 24, 2020

Christian-Sander commented Nov 28, 2020

pfalcon commented Nov 28, 2020

To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

To get TINF_DONE signalling, need to pass output buffer longer than the expected decompressed len #34

Comments

Christian-Sander commented Nov 17, 2020 • edited

pfalcon commented Nov 17, 2020

Christian-Sander commented Nov 17, 2020 • edited

pfalcon commented Nov 24, 2020

Christian-Sander commented Nov 24, 2020

Christian-Sander commented Nov 28, 2020

pfalcon commented Nov 28, 2020

Christian-Sander commented Nov 17, 2020 •

edited

Christian-Sander commented Nov 17, 2020 •

edited