Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] hole-punching support #516

Open
asomers opened this issue Dec 20, 2022 · 3 comments
Open

[FEATURE] hole-punching support #516

asomers opened this issue Dec 20, 2022 · 3 comments
Labels
feature Idea of a new feature to make MooseFS even better! :)

Comments

@asomers
Copy link

asomers commented Dec 20, 2022

Have you read through available documentation, open Github issues and Github Ideas Discussions?

Yes

System information

Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).

3.0.116, from the FreeBSD package

Operating system (distribution) and kernel version.

FreeBSD 13.1, amd64. But the feature request applies to all platforms.

Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.

All.

How much data is tracked by moosefs master (order of magnitude)?

Irrelevant

Describe your request:

What new feature would you like to see implemented in MooseFS?

Hole-punching support. That is, the client should be able to deallocate space in a file (using fspacectl on FreeBSD, or fallocate with FALLOC_FL_PUNCH_HOLE on Linux), and this deallocation should be passed to the chunkservers, which should deallocate the space on disk using the same syscalls. In libfuse, this operation is called fallocate in the low-level API.

Why this feature? Is it a necessity or a nice to have? Is this feature related to any other features or problems in the open issues?

This feature can save tons of space when storing VM images on MooseFS. In my experience with a large collection of VMs backed by NFS, I found that about half of the disk space was actually unused by the VM. If that setup had supported hole-punching, then the unused space could've been reclaimed on the storage servers.

@inkdot7
Copy link

inkdot7 commented Dec 28, 2022

These holes would typically be rather small? More like sectors of disk than entire chunks?

I.e. after the punching of the holes, the data on chunkserver disk could become rather fragmented?

When hole-punching support is not available, the VM would resort to writing zeros to disk, correct?

Such zeros would then compress efficiently. Such that ability of the chunkservers to compress data (see #7) would give a similar space-saving, also without explicit hole-punching.

@asomers
Copy link
Author

asomers commented Dec 28, 2022

The size of the holes would depend strongly on what file system the VM is using, and what its workload is like. With any file system, deleting a large unfragmented file should result in punching a large hole.
If hole-punching support is not available, then the VM might write zeros, or might do nothing. Either would be technically correct. If it does write zeros, then those will eventually compress (especially if the chunkserver runs ZFS, as mine does), but even so writing zeros takes far more network and CPU resources than punching holes.

@chogata
Copy link
Member

chogata commented Jan 11, 2023

MooseFS does not write zeros at the beginnings and ends of chunks, so if a really big hole is "punched" in the VM's file, then the corresponding chunk will become shorter at the beginning/end or even deleted completely. But we do not support holes in the middle of chunks. We could theoretically, but I'm not sure about the impact on performance it would have. Something to look into and consider.

@chogata chogata added the feature Idea of a new feature to make MooseFS even better! :) label Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Idea of a new feature to make MooseFS even better! :)
Projects
None yet
Development

No branches or pull requests

3 participants