Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid generating empty frame? #2298

Closed
ghost opened this issue Sep 8, 2020 · 3 comments
Closed

Avoid generating empty frame? #2298

ghost opened this issue Sep 8, 2020 · 3 comments
Labels

Comments

@ghost
Copy link

ghost commented Sep 8, 2020

Run this code:

#include <stdio.h>     // printf
#include <zstd.h>      // presumes zstd library is installed

void compress()
{
    char buffer[256];
    ZSTD_inBuffer in;
    ZSTD_outBuffer out;
    ZSTD_CCtx *cctx;
    
    in.src = &in;
    in.size = 0;
    in.pos = 0;
    
    out.dst = buffer;
    out.size = sizeof(buffer);
    out.pos = 0;
    
    cctx = ZSTD_createCCtx();
    
    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_flush);
    printf("ZSTD_e_flush, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_flush);
    printf("ZSTD_e_flush, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_end);
    printf("ZSTD_e_end, total output size: %zd\n", out.pos);

    ZSTD_compressStream2(cctx, &out, &in, ZSTD_e_end);
    printf("ZSTD_e_end, total output size: %zd\n", out.pos);
    
    ZSTD_freeCCtx(cctx);
}

int main(int argc, const char** argv)
{
    compress();
    return 0;
}

Output:

ZSTD_e_flush, total output size: 0
ZSTD_e_flush, total output size: 0
ZSTD_e_end, total output size: 9
ZSTD_e_end, total output size: 18

It seems empty blocks are not generated, but empty frames are generated, looks a bit inconsistent.
If the current frame has no content, can it not generate an empty frame?

@terrelln
Copy link
Contributor

terrelln commented Sep 8, 2020

If the current frame has no content, can it not generate an empty frame?

Correct, the empty frame cannot be generated. Imagine the case where someone has concatenated 3 zstd frames, and the second frame is empty. If the 2nd frame wasn't present because it was empty we would interpret the 3rd stream as the 2nd stream, and assume the 3rd stream is empty. But that would be incorrect.

Zstd always generates a valid frame, which is non-empty.

If you want, you could add some code on top of zstd which maps the empty input to the empty output. But zstd can't do that.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 8, 2020

The current implementation of streaming zstd doesn't generate an empty block when instructed to flush() with no content to send. As a consequence, it never generates empty blocks.

If there is a need for this capability, the reference implementation could be updated to add it, likely as an optional parameter of the streaming operation.

However, there is another catch : older variants of the zstd decoder have a bug which make them unable to deal with raw empty blocks. This bug is currently fixed, but we have a large enough user base that we can't take this fix for granted. Therefore, for maximum compatibility, no zstd implementation should generate raw empty blocks.

A work-around could be to send compressed empty blocks, which ironically are 1 byte larger. If empty blocks generation is requested, this technique would work and remain compatible with all deployed decoders.

Anyway, without a good enough reason (an important scenario to support), we have no incentive to make the reference implementation more complex, and therefore will stick by default to current policy to not generate empty blocks.

This is different from empty frames.
We do have received tons of requests for an ability to generate a zstd frame with a NULL content, way before v1.0 was out.
This is an understandable from a pipeline perspective.
The usual flow is input -> zstd compression -> zstd format -> transmit -> assume zstd format -> zstd decompress -> regenerated content.
Input can be anything, including empty. And simplified pipelines prefer avoiding special cases, and therefore prefers sending everything to zstd, without its own special branching for corner cases.
Therefore above pipeline must be compatible with null content, which mandates an ability to generate and decode empty frames.

If the question is rather : why zstd decoder doesn't interpret a null input to decompress as a null output generation,
then it's more complex to answer, but there are many reasons where we want to strongly differentiate error cases and absence of signal from valid transmission or storage of empty content. On top of various traffic analysis constraints. And making null input a valid decompression entry would mess with these objectives.

@ghost
Copy link
Author

ghost commented Sep 9, 2020

Thank you very much for your explanation.

If you want, you could add some code on top of zstd which maps the empty input to the empty output.

Ok, for people with such need, this code is not difficult to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants