Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-memory when mksquashfs'ing 200M files #238

Open
nh2 opened this issue Apr 1, 2023 · 2 comments
Open

Out-of-memory when mksquashfs'ing 200M files #238

nh2 opened this issue Apr 1, 2023 · 2 comments
Assignees
Milestone

Comments

@nh2
Copy link

nh2 commented Apr 1, 2023

Hi,

I'm having trouble finding concrete information on whether squashfs is designed to handle packing and unpacking large amounts of files with low/constant RAM usage.

I ran mksquashfs on directory with 200 million files, around 20 TB total size.

I used flags -no-duplicates -no-hardlinks; mksquashfs version 4.5.1 (2022/03/17) on Linux x86_64.
It OOM'ed with 53 GB resident memory usage.

Should mksquashfs handle this? If yes, I guess the OOM should be considered a bug.

Otherwise, I'd put it as a feature request, as it would be very nice to have a tool that can handle this.

@plougher plougher self-assigned this Apr 4, 2023
@plougher plougher added this to the Undecided milestone Apr 4, 2023
@plougher
Copy link
Owner

plougher commented Apr 5, 2023

This is an interesting request. Back in the early days of Squashfs (from 2002 to about 2006), Mksquashfs did one pass over the source filesystem creating the Squashfs filesystem as it went. This did not require caching any of the source filesystem and so it was very light on memory use.

Unfortunately adding features such as real inode numbers, hard-link support (including inode nlinks) and "." and ".." directories (the first two versions of Squashfs didn't have any of these) requires fully scanning the source filesystem to build an in-memory representation.

This takes memory and so unfortunately 53 GB is probably correct for around 200 million files, and so this is expected and not a bug.

But if someone was happy to forgo hard-link detection and advanced support such as pseudo files and actions, it may be possible to reduce the in-memory representation and move more to the original single pass in a "memory light mode".

I'll add it to the list of enhancements, and see if priorities allow it to be looked at for the next release.

@nh2
Copy link
Author

nh2 commented Apr 8, 2023

if someone was happy to forgo hard-link detection and advanced support such as pseudo files and actions, it may be possible

@plougher Yes, that's exactly what I'm after.

This is also why I tried with -no-duplicates -no-hardlinks, as those features already sound like they need O(number of files) memory.

squashfs is sometimes recommended as a good alternative to tar/zip that supports modern compression and random access.

Constant memory use seems to be the most critical thing missing to truly replace them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants