Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly Backup/Restore Hard Links #183

Open
maurerdietmar opened this issue Feb 12, 2019 · 5 comments
Open

Correctly Backup/Restore Hard Links #183

maurerdietmar opened this issue Feb 12, 2019 · 5 comments
Labels

Comments

@maurerdietmar
Copy link
Contributor

It is important that a restore produce an exact copy of the backup-ed data (especially when doing container backup). Seems the current catar implementation does not correctly restore hardlinks, and instead creates multiple copies.

@poettering
Copy link
Member

Yupp. Note that it can optionally generate hardlinks when extracting if this is requested (similar to how it normally generates reflinks when it can). Hence if hardlink support is added care needs to be taken that either hardlinks are included in the image or they are used as optimization technique when extracting, but not both.

Including hardlinks in the image is not entirely trivial. That's because casync's serialization format puts an emphasis on being "composable", i.e that the serialization of a directory is strictly the concatenation of everything included plus metadata about itself. But that means a hardlink cannot be encoded inside each individual dir, but must be encoded in the closest enclosing directory, i.e. the common parent of each pair of hardlinks pointing to the same inode. Moreover, each occasion of the same inode needs to be serialized with full payload, so that things remain entirely composable. This sounds wasteful, but is actually not that bad, given that the usual chunking logic should recognize this anyway, and hence dedup this again.

@poettering poettering added the RFE label Feb 12, 2019
@maurerdietmar
Copy link
Contributor Author

Including hardlinks in the image is not entirely trivial. That's because casync's serialization format puts an emphasis on being "composable", i.e that the serialization of a directory is strictly the concatenation of everything included plus metadata about itself.

I can't see why this property is a hard requirement? I would simply store hardlinks like symlinks, using an extra 'HARDLINK' tag. This is easy to implement, and I cannot see any real drawbacks?

@poettering
Copy link
Member

I can't see why this property is a hard requirement? I would simply store hardlinks like symlinks, using an extra 'HARDLINK' tag. This is easy to implement, and I cannot see any real drawbacks?

Well, the design focusses on composability and reproducability. and that means that if you serialize a directory /foo/a and then a directory /foo/b and then /foo as a whole, then /foo should really reference the same blocks as /foo/a and /foo/b...

I mean, if you handle hardlinks (as you propose) the algorithm would be something like this:

Serialization: For each inode we encounter with st_nlink > 1 we'd create a db entry somewhere, with a lookup key of ino_t+dev_t and a lookup value of the fs path. If the db entry is created new you'd serialize things as a regular file, if it already exists you'd instead serialize things as your new proposed HARDLINK object.

Deserialization: whenever HARDLINK is encountered, you just call link(), trivially.

Now, if we retain the composability the algorithm would be changed like this:

Serialization: As before, for each inode with st_nlink > 1 we create a db entry somewhere, as before. regardless if the db entry already exists or not, we'd serialize the file normally (i.e. as regular file, no "HARDLINK" object involved here). This time the db would use ino_t+dev_t as key (as above), but the value would not be a single path, but a set of paths, i.e. all paths we discovered the inode under. Then, as we continue serializating, whenever we go one recursion level up, (i.e. return from one file/dir to the dir it is contained in), we check if any of the db entries have two or more paths with a prefix that matches our dir. If so, we generate a HARDLINK object with all listed paths.

Deserialization: we deserialize normally. whenever we hit a HARDLINK object we replace the all listed objects with a hardlink to the first one listed.

So, yes, this is a bit more complex to implement, but I really like the concept of composability, and thankfully the chunking should remove the redundancy this introduces again.

The composability is not just a theoretic nicety btw: it also means that when searching for a file inside an archive (the way the fuse logic in casync does it on each file access) is conceptually bisection as we search for it strictly in a constantly shriking search window if you so will...

@maurerdietmar
Copy link
Contributor Author

This basically stores hardlinked content multiple times, which makes using .catar format quite unefficient if you have many hardlinks. The deduplication layer may reduce space waste for some cases, but is a serious drawback if you use the .catar format without the dedup layerg above ...

@Silvanoc
Copy link

Silvanoc commented Jun 5, 2020

I also stumbled upon this issue looking at casync for container images. Therefore I can understand, that it's not fun viewing how the root-filesystem of a busybox container image explodes from 1.3M WITH hardlinks to 431M WITHOUT hardlinks (equivalent to what you get using the .catar format).

I understand that what @poettering proposes is helpful for the .caidx format, but not for the .catar format. The rest of this comment is based on this assumption.

I wonder why @maurerdietmar wants to use .catar for container images. What does it bring compared to the "standard" layers format? If you don't use neither layers nor a casync store, then you don't have any cross-image deduplication at all.

I'm looking at casync using the .caidx format to get at least file-based deduplication (much better than layer deduplication for base images and chunk deduplication makes it even better). The storage-footprint of base images debian:stable, debian:testing and ubuntu:latest gets reduced from 304M with layer deduplication to 171M with .caidx deduplication. So if my assumption is right, then the proposal would suffice my use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants