New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does srcSize passed to lz4_decompress_safe_continue need to equal compressed block size? #1172
Comments
Indeed, The scenario you are looking for, replacing While |
Hey, so I've looked at I did try passing In other words, what I've found so far is writing this code compressedSize = -3 - LZ4_decompress_safe_partial_usingDict(
inPtr, outPtr,
inBufferLen,
// Lie about the decompressed size (add 1) to get lz4 to error out and return us the compressed size
decompressedSize + 1, decompressedSize + 1,
dictPtr, dictSize); Wondering if that's the sort of thing you had in mind, or if you had something a little more straightforward in mind. Thanks! I really don't think I would be comfortable committing the magic number |
Okay, I'm pretty sure the Still looking for a good way to use the API to decode a known decompressed size and return the compressed size that was used. Thanks! Edit: I'm also working with the folks at work to see if we can alter our data format to move off of the deprecated API, but I'd still like to find a way to use the new set of functions with our current data format if possible. |
@Cyan4973 Sorry to bug you again, but are you sure this use case is still supported with the current API? Thanks! |
If we are talking about the first scenario presented at the top of the issue, then there is another important thing to ask for : is the decompressed size of each block known ? |
Correct. Multiple compressed blocks stacked back to back, total compressed size known, and decompressed size for each block known. |
Ah, and since you are using the |
No, during compression we call |
OK, |
To answer your question, the code was like that when I got here. I'm just reporting what it says, however I agree that it would make more sense not to use the My whole intent here is just not to use deprecated functions if possible. It looks like |
To be more concrete, here's some example code. It compresses some data into a series of blocks and returns the compressed data (and also the number of blocks). The decompress function discovers the compressed size of each block from the return value of However, #include <iostream>
#include <string>
#include <vector>
#include "lz4.h"
// Returns number of compressed blocks and compressed data
// Takes decompressed block size and data to compress
std::pair<size_t, std::vector<char>>
compress(size_t decompressed_block_size, std::string str) {
std::vector<char> out_buf(1024); // big enough
size_t block_count = 0;
auto offset = 0;
while (block_count * decompressed_block_size < str.size()) {
offset += LZ4_compress_default(
str.data() + block_count * decompressed_block_size,
out_buf.data() + offset,
decompressed_block_size,
out_buf.size() - offset
);
block_count++;
}
out_buf.resize(offset);
return {block_count, out_buf};
}
// Returns decompressed data
// Takes decompressed block size, number of compressed blocks, and compressed data (which includes the total size of the compressed data)
std::string
decompress(size_t decompressed_block_size, std::pair<size_t, std::vector<char>> compressed) {
std::string result;
size_t offset = 0;
for (size_t i = 0; i < compressed.first; i++) {
std::string decompressed_block(decompressed_block_size, '\0');
offset += LZ4_decompress_fast(
compressed.second.data() + offset,
decompressed_block.data(),
decompressed_block_size
);
result += decompressed_block;
}
return result;
}
int main() {
constexpr size_t decompressed_block_size = 128;
std::string str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Amet volutpat consequat mauris nunc congue nisi vitae. Tortor at auctor urna nunc id. Eu ultrices vitae auctor eu augue ut lectus arcu. Arcu odio ut sem nulla pharetra diam. Dui id ornare arcu odio ut sem nulla pharetra diam. Nulla aliquet enim tortor at auctor urna nunc. Odio ut sem nulla pharetra diam. Viverra tellus in hac habitasse. Purus sit amet volutpat consequat. Viverra vitae congue eu consequat ac felis donec et. Ut placerat orci nulla pellentesque dignissim enim sit amet. Orci a scelerisque purus semper eget duis at tellus. Venenatis a condimentum vitae sapien pellentesque habitant morbi tristique. Elementum facilisis leo vel fringilla est ullamcorper. Posuere ac ut consequat semper viverra.";
// Round string length up to multiple of decompressed block size
if (auto mod = str.size() % decompressed_block_size) {
str.resize(str.size() + (decompressed_block_size - mod), '\0');
}
auto compressed = compress(decompressed_block_size, str);
auto decompressed = decompress(decompressed_block_size, compressed);
std::cout << str << std::endl;
std::cout << decompressed << std::endl;
} |
Sure. The way it works :
Maybe I could also update the documentation to make that clearer. edit : |
Having looked at the API again, The main problem is returning the nb of bytes read from I presume restoring this capability will require creating a new entry point in the library. |
I see how you have managed to make At this point, my recommendation would be to keep using |
Okay, thank you so much for your time! |
I'm updating some code for work, and we're using the deprecated lz4_decompress_fast_continue. This function takes a pointer to a compressed block, and returns the number of bytes read from the source buffer.
I have two questions about lz4_decompress_safe_continue.
(1) Can srcSize be larger than the compressed block?
(2) What is the return value of lz4_decompress_safe_continue? (It is not documented.)
I peeked at lz4.c, and it seems like the answer to (2) is that the return value is the decompressed size. From this, I reason that the answer to (1) is no, since there would be no way to tell how much data lz4_decompress_safe_continue read from the source buffer.
The reason I ask is that we happen to be compressing a series of blocks and saving the total compressed size, but not the block boundaries. We are then relying on the behavior of lz4_decompress_fast_continue to stop decompressing at the end of a block and return the amount of data decompressed, in order for us to decompress the blocks in a loop. It seems like the safe API is incompatible with our current data format.
To be specific, we are decompressing a memory-contiguous sequence of blocks in a loop. I understand that
lz4_decompress_*_continue
must be used in this situation, becauselz4_decompress_*
assumes that one block is being decompressed, and will give incorrect results if it is run on a sequence of blocks. Is this true, or is e.g.lz4_decompress_safe
intended to decompress multiple contiguous blocks?The one action item I'd suggest from this issue is to document (1) and (2) in the comments to lz4_decompress_safe_continue.
Thanks!
The text was updated successfully, but these errors were encountered: