Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: increase max block size from 1M to 16M #217

Open
cgm999 opened this issue Jan 21, 2023 · 10 comments
Open

Feature request: increase max block size from 1M to 16M #217

cgm999 opened this issue Jan 21, 2023 · 10 comments
Assignees
Milestone

Comments

@cgm999
Copy link

cgm999 commented Jan 21, 2023

Hi,

I am using a patch to increase the block size because is having better compression. Is it possible to increase it in a future version?
I forgot what was the max block size that does not break the binary optimization related to some marker..

@cgm999 cgm999 changed the title increase max block size from 1M to 16M or 32M Feature request: increase max block size from 1M to 16M or 32M Jan 24, 2023
@plougher plougher self-assigned this Jan 29, 2023
@plougher plougher added this to the Undecided milestone Jan 29, 2023
@maiziyi
Copy link

maiziyi commented May 5, 2023

Hi,

I am trying to test this feature also, I modify the squashfs_fs.h which is both in squashfs-tools and kernel, simply modify the SQUASHFS_FILE_MAX_SIZE and SQUASHFS_FILE_MAX_LOG to implement this function(but I only change the block size to 8M). This patch can really reduce the size of squashfs image, but I really worry about the impact of this feature on read performance.

@mgord9518
Copy link

Assuming SquashFS readers don't discard the 3 bits between 1 << 20 (current max block size) and 1 << 24 (compressed bit), it could be increased to 8MiB without breaking anything. With some small changes to readers (a couple lines of code), the block size could be increased to 16MiB. Either choice should probably increment the SquashFS minor version number since existing tooling can currently make the assumption that blocks will be no larger than 1MiB.

This code snippet shows the layout of a data block reference:

pub const DataEntry = packed struct {
    // Maximum SquashFS block size is 1MiB, which can be
    // represented by a u21
    size: u21,

    // If we use these, 8MiB can now be represented
    UNUSED: u3 = undefined,

    is_uncompressed: bool,
    UNUSED2: u7 = undefined,
};

Technically speaking, the upper 7 bits could even be utilized, but that would be hacky and complicated; they might also be better used for something else.

Increasing the block size might be good for future-proofing but really impacts random access performance on today's computers.

@cgm999 cgm999 changed the title Feature request: increase max block size from 1M to 16M or 32M Feature request: increase max block size from 1M to 16M May 2, 2024
@cgm999
Copy link
Author

cgm999 commented May 2, 2024

For me 16MB blocks works fine.. and is making the sq image smaller . I attached my patch if someone finds usefull ..
I also use zstd for good decompress speed.

EDIT: Patch is broken if data is not compressable so I removed it to avoid any issues with data loss.

To mount images I use https://github.com/vasi/squashfuse which does not require any patch

@voidpointertonull
Copy link

Increasing the block size might be good for future-proofing but really impacts random access performance on today's computers.

It could be for more than just future-proofing. Surely it's not really the goal of the project, but with a bit of compression improvement and better tooling (I guess mostly libarchive support for "transparent" I/O support), squashfs could become a viable replacement for a bunch of tar use cases.

Random access time for tar files is pretty much a worst case scenario so even hundreds of MiB block sizes would be an improvement there, they just have a compression advantage due to being compressed as a single block, and support for all kinds of related formats, even the not too old tzst is quite wide in file managers.

Not sure if this kind of archive file use case is ever planned to be "supported", but I've definitely "abused" it as such in a few cases as a large tar file isn't feasible to browse, and 7zip is too dumb to deal with even just symbolic links, while squashfs makes a proper archive that can be browsed (with FUSE mounting at least), it's just not as well compressed as the other options.

@mgord9518
Copy link

mgord9518 commented May 2, 2024

@cgm999 squashfuse (as well as the Linux kernel) actually would require a patch for this. If a block doesn't compress, the compressed bit will be set and additional logic will need to be used to correctly get the block size. Of course you won't run into this situation often, but once you do it'll make for some hard to track bugs and corrupted reads.

This code already exists in squashfuse, it would just need to be moved to the sqfs_data_header function, which happens to be directly below the function it's currently in.

@mgord9518
Copy link

mgord9518 commented May 2, 2024

@voidpointertonull Yeah, I'm not gonna argue that. It could be useful for long-term backups where random access speed isn't super important.

Although once you break 16MiB, SquashFS would need some major format changes, which obviously wouldn't be compatible with current tooling

@cgm999
Copy link
Author

cgm999 commented May 2, 2024

@mgord9518 Ah yes, I guess I never hit that case where a block is not compressed and used as is .. and I do use this patch for years (is same type of data which I guess it explains why I never hit the issue , and I do compare the source with mounted squashfs via fuse then remove source ).

@plougher
Copy link
Owner

plougher commented May 2, 2024

@mgord9518 The compressed bit was set at 1 << 24 to allow for increases in the maximum block size, if required. The max block size of 1M was chosen in 2009 because the only compression algorithm at the time (in the kernel) was gzip and that can't make good use of 1M blocks anyway, because the window size is too small. xz/zstd can and increasing the maximum block size is already planned for the next major release (4.7).

@mgord9518
Copy link

@plougher Nice, I'm excited to mess with larger block sizes. Why was 1<<24 chosen over 1<<31?

@plougher
Copy link
Owner

plougher commented May 2, 2024

@plougher Nice, I'm excited to mess with larger block sizes. Why was 1<<24 chosen over 1<<31?

It left the upper bits free for other uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants