Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose additional compression levels #4728

Open
ebattenberg opened this issue Mar 11, 2024 · 3 comments
Open

Expose additional compression levels #4728

ebattenberg opened this issue Mar 11, 2024 · 3 comments

Comments

@ebattenberg
Copy link

ebattenberg commented Mar 11, 2024

Output of restic version

restic 0.16.4 compiled with go1.21.6 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

It would be great if restic exposed the additional compression levels offered by the klauspost/compress library.

What are you trying to do? What problem would this solve?

The current 'auto' and 'max' levels map to the SpeedDefault and SpeedBestCompression levels, respectively, in the klauspost/compress library, and while these are great to have, the 'max' setting occupies a fairly extreme point on the compression:speed tradeoff curve. It's useful for remote repos when upstream bandwidth is limited (<10Mbps), but can significantly slow down transfers over faster connections. The klauspost/compress library offers an intermediate mode (SpeedBetterCompression) that might be more useful to people who want a bit more compression than SpeedDefault but who don't want to go all the way to the orders-of-magnitude-slower SpeedBestCompression. Additionally, SpeedFastest might be a desirable configuration for people who are doing local disk backups and care most about speed but still want some compression.

I've benchmarked the various zstd compression levels (1-22) on my (dinky) Celeron N5095 and noticed:

  • Level 1 (SpeedFastest) runs at around 500-2000 MB/s.
  • Level 3 (SpeedDefault / 'auto') runs at around 300-1500 MB/s and achieves approximately the same compression ratios as Level 1.
  • Level 7-8 (SpeedBetterCompression) runs at around 50-100 MB/s and achieves 10-30% better compression ratios than Level 1.
  • Level 19 (I think this is equivalent to SpeedBestCompression / 'max') runs at around 0.8-3 MB/s and achieves 20-70% better compression ratios on compressible files than Level 1.

These are all pretty significant differences, especially with respect to speed, and since people use restic with a variety of backends on a variety of hardware, it seems like more control over the compression levels would be useful.

I tested the two extra modes ("better" and "fastest") inside restic by adding the appropriate enums and switch cases in internal/repository/repository.go and they seem to produce backup speeds and repo sizes proportional to what would be expected given my zstd benchmarks above.

Would exposing these additional compression levels be useful to the restic community? Does doing so jive with the general philosophy of the project? I'm sure this was considered at some point when compression was initially added, but I had trouble finding relevant discussions.

This is the first time I've touched golang or restic code. If there's interest, I'm happy to take a stab at a proper PR, but I wanted to get a sense for whether this was worth pursuing first. I imagine this is something an experienced restic contributor could knock out in 20 minutes (which is why I think there might be philosophical issues with the proposed changes or something else I'm not considering).

Did restic help you today? Did it make you happy in any way?

I'm pretty new to restic, but I'm currently pretty pumped about it and I especially like using it with resticprofile to automate and organize all of my backups. It also did my taxes for me today and I even saw it help an old lady cross the street.

@konidev20
Copy link
Contributor

Hey @ebattenberg, did you run tests on an SSD or HDD? Also, it would be great if you shared the benchmark results over here to get some interest from the community.

@ebattenberg
Copy link
Author

ebattenberg commented Mar 18, 2024

I'm happy to provide more info on the rough benchmarks. I thought my original post was getting long enough, so I just provided the raw zstd benchmarks that are restic repo agnostic, thinking that those very noticeable differences would be enough to motivate additional compression levels.

I also ran my hacked-up fork using the 4 compression levels on a variety of restic backends. Here's some more details on those results.

  • Source: ~300MB of compressible and incompressible files from one of my home directories.
  • Repo backends: Local Disk (SSD), 1Gbps SFTP, 10Mbps Backblaze B2 (S3 API).
  • For the Local Disk and SFTP backends, I ran two runs for each configuration and recorded the minimum time of the two. I only ran one run for the Backblaze backend because it was much slower.
  • CPU: Intel Celeron N5095

I'll share the specific results via a Google spreadsheet if that's cool: Restic Compression Levels Bench

Comments:

  • Reminder that these results are for a specific source directory makeup, using a specific CPU, over 3 specific backends. Results will vary widely for different setups as will user preferences for speed vs backup size.
  • For local disk backups, using no compression is fastest, but using the 'fastest' level causes barely any slowdown and you get almost all of the compression of the higher levels basically for free. I think this is a pretty good argument for including the 'fastest' level. Anyone that is currently using no compression for speed reasons would surely want to start using the 'fastest' level unless they know for sure that all of the files they're backing up are incompressible (already compressed).
  • For 1Gbps SFTP, the 'fastest' compression level runs much faster than no compression due to having to send much less data over ethernet. 'default / auto' also provides some speedup. 'best / max' results in significant slowdown and would probably be overkill, but 'better' takes about the same amount of time as no compression and achieves strong compression ratios.
  • Backblaze B2 over a 10Mbps upstream internet connection clearly favors the 'best / max' compression level in my setup, both for speed and for backup size.
  • These results don't make a strong case for the 'better' compression level given that it barely beats the compression ratio of 'default / auto', but I think that's specific to the particular test data I used. The raw zstd benchmarks I did with a variety of files definitely favored 'better' more.
  • 'better' could be the sweet spot when you want to squeeze out as much compression as possible without significantly slowing down the backup on faster backends ('best / max' took about twice as long on the two local backends).

I'm sure the two additional compression levels would be useful for some subset of restic users given the variety of user preferences, use cases, and system configurations out there.

I'd be glad to share a cleaned up version of my fork or take a stab at a proper PR if anyone is interested.

@ebattenberg
Copy link
Author

ebattenberg commented Mar 18, 2024

I should add that I became aware of these additional zstd compression levels when I was testing out kopia. Kopia provides a compression benchmarking tool for all of the different compression algorithms it supports, and its compression docs make the simple point that

"""
As soon as the throughput of compression is higher than I/O, compression is no longer the bottleneck. Therefore, any higher compression basically comes as free.
"""

In this way, providing additional compression levels allows users to configure higher compression within the "free" throughput range for a wider variety of CPU/backend combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants