Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[public repo] images should use high compression #194

Closed
unclejack opened this issue Mar 26, 2013 · 13 comments
Closed

[public repo] images should use high compression #194

unclejack opened this issue Mar 26, 2013 · 13 comments
Milestone

Comments

@unclejack
Copy link
Contributor

The images from the public repository should be using bzip2 -9 compression to speed up downloads and reduce traffic.

gzip compressed image:
pybuilder 288 MB

bzip2 -9 compressed image:
pybuilder.bz2 268 MB

savings: 20 MB, about 7%

@shykes
Copy link
Contributor

shykes commented Mar 31, 2013

While we're at it, we might as well enable high compression whenever docker creates a tarball. Eg. 'docker push' and 'docker export'.

@unclejack
Copy link
Contributor Author

Using higher compression for all operations which create tarballs would be really nice.

The return is quite significant over time when it comes to the amount of storage needed on S3 and the network traffic.

I think only these changes are required to start using high compression for all new images on the registry:
registry.go:

 // FIXME: Don't do this :D. Check the S3 requierement and implement chunks of 5MB
 // FIXME2: I won't stress it enough, DON'T DO THIS! very high priority
- layerData2, err := Tar(path.Join(graph.Root, img.Id, "layer"), Gzip)
- layerData, err := Tar(path.Join(graph.Root, img.Id, "layer"), Gzip)
+ layerData2, err := Tar(path.Join(graph.Root, img.Id, "layer"), Bzip2)
+ layerData, err := Tar(path.Join(graph.Root, img.Id, "layer"), Bzip2)

@jpetazzo
Copy link
Contributor

Don't use bzip2: use lzma/xz instead. xc is faster than bzip2 and achieves better compression; and it even has an "extreme" mode achieving even tighter compression (but then making it slower).

See e.g. http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

Also, I couldn't check in the source, but we should obviously make sure that the layer hash is done on the uncompressed tar.

@shykes
Copy link
Contributor

shykes commented Mar 31, 2013

There is no layer hash right now, image IDs are computed randomly. I want
to bring back content-generated IDs and yes, that requires a tar-aware
checksum.

On Sunday, March 31, 2013, Jérôme Petazzoni wrote:

Don't use bzip2: use lzma/xz instead. xc is faster than bzip2 and
achieves better compression; and it even has an "extreme" mode achieving
even tighter compression (but then making it slower).

See e.g.
http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

Also, I couldn't check in the source, but we should obviously make sure
that the layer hash is done on the uncompressed tar.


Reply to this email directly or view it on GitHubhttps://github.com//issues/194#issuecomment-15697513
.

@unclejack
Copy link
Contributor Author

I've done some tests with the pybuilder image:

468 MB pybuilder.tar
221 MB pybuilder.tar.gz - 47% of the original size
208 MB pybuilder.tar.bz2 - 44% of the original size
180 MB pybuilder.tar.xz - 38% of the original size

The xz lzma2 compressed image is 14% smaller than the bzip2 compressed image.

Other images show a similar decrease in size. Some even go down to 30% of the original size.

@ghost ghost assigned unclejack Apr 1, 2013
@jpetazzo
Copy link
Contributor

jpetazzo commented Apr 1, 2013

Note: we would also have to update docker dependencies and installation instructions to tell people to install xz.

Bonus points if docker makes sure that xz is installed when starting, to get an informative error message rather than a cabalistic tar error (should that be another issue?)

@shykes
Copy link
Contributor

shykes commented Apr 1, 2013

The extra dependency is definitely a -1. Is it really worth the trouble compared to bzip2 -9?

@unclejack
Copy link
Contributor Author

xz isn't required. bsdtar has native support for xz compression and it doesn't need xz from xz-utils, nor anything else.

I've just verified this by using bsdtar to compress in xz format, ran xz to make sure it's not there and then installed xz-utils to extract the archive. Everything worked.

So there's really nothing to warn about, other than about bsdtar's absence.

@shykes
Copy link
Contributor

shykes commented Apr 1, 2013

Awesome.

On Mon, Apr 1, 2013 at 10:26 AM, unclejack notifications@github.com wrote:

xz isn't required. bsdtar has native support for xz compression and it
doesn't need xz from xz-utils, nor anything else.

I've just verified this by using bsdtar to compress in xz format, ran xz
to make sure it's not there and then installed xz-utils to extract the
archive. Everything worked.

So there's really nothing to warn about, other than about bsdtar's absence.


Reply to this email directly or view it on GitHubhttps://github.com//issues/194#issuecomment-15725699
.

@titanous
Copy link
Contributor

titanous commented Apr 1, 2013

If we want to drop the bsdtar dependency we can. Just swap in archive/tar and http://godoc.org/code.google.com/p/lzma instead of shelling out. It may make sense to wait until the registry supports streaming upload to do this, but it's not required.

@shykes
Copy link
Contributor

shykes commented Apr 1, 2013

archive/tar doesn't support actual tarring and untarring on the filesystem.
Only parsing/encoding of the tar stream itself.

There is also auto-detection of compression which is a really useful
feature.

On Monday, April 1, 2013, Jonathan Rudenberg wrote:

If we want to drop the bsdtar dependency we can. Just swap in archive/tarhttp://tip.golang.org/pkg/archive/tar/and
http://godoc.org/code.google.com/p/lzma instead of shelling out. It may
make sense to wait until the registry supports streaming upload to do this,
but it's not required.


Reply to this email directly or view it on GitHubhttps://github.com//issues/194#issuecomment-15726255
.

@titanous
Copy link
Contributor

titanous commented Apr 1, 2013

@shykes Yeah, it would require essentially reimplementing the file walking and tar header building that tar/bsdtar does.

@unclejack
Copy link
Contributor Author

The changes mentioned in this issue were made by pull request #308.

Another issue was created to add the hashing for layer contents and parent id when creating the image id. The issue is #310.

@unclejack unclejack removed their assignment Jul 24, 2014
runcom pushed a commit to runcom/docker that referenced this issue Oct 13, 2016
rtyler pushed a commit to rtyler/docker that referenced this issue Feb 23, 2018
use /etc/init.d/jenkins variant to provide multiple java options with quotes
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Jun 18, 2019
…nerd_v1.2.6

[18.09 backport] Bump containerd v1.2.6, runc v1.0.0-rc8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants