Avoid generating empty frame? #2298

ghost · 2020-09-08T05:45:49Z

Run this code:

#include <stdio.h>     // printf
#include <zstd.h>      // presumes zstd library is installed

void compress()
{
    char buffer[256];
    ZSTD_inBuffer in;
    ZSTD_outBuffer out;
    ZSTD_CCtx *cctx;
    
    in.src = &in;
    in.size = 0;
    in.pos = 0;
    
    out.dst = buffer;
    out.size = sizeof(buffer);
    out.pos = 0;
    
    cctx = ZSTD_createCCtx();
    
    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_flush);
    printf("ZSTD_e_flush, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_flush);
    printf("ZSTD_e_flush, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_end);
    printf("ZSTD_e_end, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_end);
    printf("ZSTD_e_end, total output size: %zd\n", out.pos);
    
    ZSTD_freeCCtx(cctx);
}

int main(int argc, const char** argv)
{
    compress();
    return 0;
}

Output:

ZSTD_e_flush, total output size: 0
ZSTD_e_flush, total output size: 0
ZSTD_e_end, total output size: 9
ZSTD_e_end, total output size: 18

It seems empty blocks are not generated, but empty frames are generated, looks a bit inconsistent.
If the current frame has no content, can it not generate an empty frame?

The text was updated successfully, but these errors were encountered:

terrelln · 2020-09-08T17:32:52Z

If the current frame has no content, can it not generate an empty frame?

Correct, the empty frame cannot be generated. Imagine the case where someone has concatenated 3 zstd frames, and the second frame is empty. If the 2nd frame wasn't present because it was empty we would interpret the 3rd stream as the 2nd stream, and assume the 3rd stream is empty. But that would be incorrect.

Zstd always generates a valid frame, which is non-empty.

If you want, you could add some code on top of zstd which maps the empty input to the empty output. But zstd can't do that.

Cyan4973 · 2020-09-08T20:02:53Z

The current implementation of streaming zstd doesn't generate an empty block when instructed to flush() with no content to send. As a consequence, it never generates empty blocks.

If there is a need for this capability, the reference implementation could be updated to add it, likely as an optional parameter of the streaming operation.

However, there is another catch : older variants of the zstd decoder have a bug which make them unable to deal with raw empty blocks. This bug is currently fixed, but we have a large enough user base that we can't take this fix for granted. Therefore, for maximum compatibility, no zstd implementation should generate raw empty blocks.

A work-around could be to send compressed empty blocks, which ironically are 1 byte larger. If empty blocks generation is requested, this technique would work and remain compatible with all deployed decoders.

Anyway, without a good enough reason (an important scenario to support), we have no incentive to make the reference implementation more complex, and therefore will stick by default to current policy to not generate empty blocks.

This is different from empty frames.
We do have received tons of requests for an ability to generate a zstd frame with a NULL content, way before v1.0 was out.
This is an understandable from a pipeline perspective.
The usual flow is input -> zstd compression -> zstd format -> transmit -> assume zstd format -> zstd decompress -> regenerated content.
Input can be anything, including empty. And simplified pipelines prefer avoiding special cases, and therefore prefers sending everything to zstd, without its own special branching for corner cases.
Therefore above pipeline must be compatible with null content, which mandates an ability to generate and decode empty frames.

If the question is rather : why zstd decoder doesn't interpret a null input to decompress as a null output generation,
then it's more complex to answer, but there are many reasons where we want to strongly differentiate error cases and absence of signal from valid transmission or storage of empty content. On top of various traffic analysis constraints. And making null input a valid decompression entry would mess with these objectives.

ghost · 2020-09-09T01:59:50Z

Thank you very much for your explanation.

If you want, you could add some code on top of zstd which maps the empty input to the empty output.

Ok, for people with such need, this code is not difficult to implement.

Cyan4973 added the question label Sep 8, 2020

felixhandte closed this as completed Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid generating empty frame? #2298

Avoid generating empty frame? #2298

ghost commented Sep 8, 2020

terrelln commented Sep 8, 2020

Cyan4973 commented Sep 8, 2020 •

edited

ghost commented Sep 9, 2020 •

edited by ghost

Avoid generating empty frame? #2298

Avoid generating empty frame? #2298

Comments

ghost commented Sep 8, 2020

terrelln commented Sep 8, 2020

Cyan4973 commented Sep 8, 2020 • edited

ghost commented Sep 9, 2020 • edited by ghost

Cyan4973 commented Sep 8, 2020 •

edited

ghost commented Sep 9, 2020 •

edited by ghost