Skip to content

Commit

Permalink
Add file/folder illustration of the repo format
Browse files Browse the repository at this point in the history
The DESIGN document did help me a lot to understand the functionality of bup quite a lot, however from reading "git for computer scientists" alone, I did not realize that git trees and blobs are 1-to-1 mapped to directories and files on `git checkout`. I searched for ways in how git alone can merge several blobs to one file, which it cant. The reason is this sentence:

> ... so you can use git to manipulate the bup repository if you want, and you
> probably won't break anything.  It's also a comfort to know you can squeeze
> data out using git, just in case bup fails you, and as a developer, git offers
> some nice tools ...

where I interpreted "squeeze out" as a simple "git checkout", which is not true, as bup
splits files. The added text makes this more concrete and prepares the reader for the
parts about hashsplitting and fanout

Signed-off-by: Moritz Lell <mlell08@gmail.com>
  • Loading branch information
mlell committed Apr 18, 2024
1 parent 75826b1 commit 9ecc7c4
Showing 1 changed file with 14 additions and 5 deletions.
19 changes: 14 additions & 5 deletions DESIGN
Expand Up @@ -104,14 +104,23 @@ tools (like 'git rev-list' and 'git log' and 'git diff' and 'git show' and
so on) that allow you to explore your repository and help debug when things
go wrong.

Now, bup does use these tools a little bit differently than plain git. We
Now, bup does use these tools a little bit differently than plain git. We
need to do this in order to address two deficiencies in git when used for
large backups, namely a) git bogs down and crashes if you give it really
large files; b) git is too slow when you give it too many files; and c) git
doesn't store detailed filesystem metadata.

Let's talk about each of those problems in turn.

doesn't store detailed filesystem metadata. We'll talk about each of those
problems in turn.

But first, as a very quick overview, let's pretend bup has vanished from the
face of the earth and you have to resort to a 'git checkout' to obtain a
large file from your backup: You would see that instead of the file, a folder
appears at its location. Inside, within different levels of subfolders, you
would find multiple smaller files. These contain chunks of the original file,
so to restore your file, you need to concatenate them in order of their file
names. (For example with a command like
`find path/to/your/file.bup -type f | sort | xargs cat > output/file`). This
chunks are the product of *hashsplitting*. And now we can review this and
other techniques how bup makes git ready for large backups.

Handling large files (cmd/split, hashsplit.split_to_blob_or_tree)
--------------------
Expand Down

0 comments on commit 9ecc7c4

Please sign in to comment.