Info about tar-split.json.gz #47446
-
While going through the internal docker files, I came across a file named |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
When using the graph-driver storage drivers, the docker daemon pulls images and their layers. Image layers are distributed as compressed tar-archives; after downloading the layer, the daemon verifies the digest of the compressed layer, extracts the layer, and stores the files that are in them. The (compressed) layer-archives are only needed to distribute the image (through a registry), but not needed to run the image (when running the image, only the extracted content is needed), so to optimize storage consumption, the downloaded archives are deleted (and only the extracted files are preserved). While the archived layers are deleted, it preserves the tar-archive headers: tar-archives contain a header section, which contains file information of all files included (names of the files and their properties, such as modification dates, ownership) this header information is "split" from the tar-archive and stored in a tar-split file. In most cases, those files are not needed; when pushing an image to a registry that already has an image-layer, there's no need to push the layer again, so no layer will be uploaded. However, if the image layer is not present in the registry (for example, if you're pushing an image to a different registry), it needs to re-create the image layer. This is where the tar-split files will be used; when recreating the layer (tar-archive), all files that were part of the original layer that was downloaded must be included, with the file-properties as they were present in the original layer. The tar-split file is used for that; it's used as an index to locate the (extracted) files in the local storage. Each file to include is added to the archive, including the metadata (filename, file-properties) of the original archive. Changes in metadata on the extracted files are ignored (in case files were modified locally), and so are files that were not part of the original layer (e.g. say a So, basically:
You can read more on tar-split in the GitHub repository that implements this; https://github.com/vbatts/tar-split |
Beta Was this translation helpful? Give feedback.
When using the graph-driver storage drivers, the docker daemon pulls images and their layers. Image layers are distributed as compressed tar-archives; after downloading the layer, the daemon verifies the digest of the compressed layer, extracts the layer, and stores the files that are in them. The (compressed) layer-archives are only needed to distribute the image (through a registry), but not needed to run the image (when running the image, only the extracted content is needed), so to optimize storage consumption, the downloaded archives are deleted (and only the extracted files are preserved). While the archived layers are deleted, it preserves the tar-archive headers: tar-archives cont…