Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed #5

Open
ThomasWaldmann opened this issue Sep 6, 2022 · 6 comments
Open

speed #5

ThomasWaldmann opened this issue Sep 6, 2022 · 6 comments

Comments

@ThomasWaldmann
Copy link
Contributor

ThomasWaldmann commented Sep 6, 2022

... depends on a lot of things and might be hard to compare.

just a few insights (from borg development):

  • borg 1.x does some information gathering based primarily on the filename, assuming that same filename means same file.
  • borg 2.0 will play a bit more on the safe side considering race conditions due to changing file systems, so it opens the file to get a file descriptor (fd) and then does the information gathering using the fd. the fd will always refer to the same fs object.
  • borg >= 1.2 checks if a file has changed while it was backed up.
  • these are a few reasons why more recent borg versions got a bit slower than older ones, especially on NFS, because open() and stat() are slow there.

So, sometimes speed == quick & dirty and slower == better / safer.

The less you do, the faster you get. The question is then if you still do enough / all that is needed.

@deajan
Copy link
Owner

deajan commented Sep 7, 2022

I can definitly add the file descriptor part to the README section.

I don't get the part where borg >= 1.2 checks if a file has changed while it was backed up.
Does it the backed up file to it's last state while doing backups ? What if the file continuously changes ?

So, sometimes speed == quick & dirty and slower == better / safer.

This statement is something I can live by, except for parked files like qcow files with external snapshots, which will never change while being backed up (actual thing I do with borg as of today).

@ThomasWaldmann
Copy link
Contributor Author

The "changed while backup" only detects that there might be a problem, it does not avoid it (like a snapshot).

In some cases, it might be not an issue (like e.g. a log file growing a line at the end), but in other cases it might warn the user of an issue (e.g. if you backup some sort of database and the file changes internally while you back it up - the file as read by borg could then be inconsistent internally).

@deajan
Copy link
Owner

deajan commented Sep 8, 2022

Thanks for the clarification. This let's me think of pre-freeze and post-thaw scripts for databases ;)

I'll add a "backup coherence" entry in the table which I can link to this discussion.

Jsut a side question, when using borg cli, will there be a specific exit code in those cases, or must the output be parsed to find out whether a file changed while being backed up ?

@ThomasWaldmann
Copy link
Contributor Author

Currently there are only a few exit codes and also it is hard to map warnings to exit codes (because there can be multiple different warnings), so one currently needs to read the log output.

@deajan
Copy link
Owner

deajan commented Oct 2, 2022

I added a new benchmark with qemu disk images (see last README.md file)
Noticed that borg performs quite well for that usecase, whearas backing up the linux kernel source files is not that great in terms of speed.
Is that explained by your above statements about open() and stat() ?

@ThomasWaldmann
Copy link
Contributor Author

Could be, because if you have a lot of small files, the per-file overhead has a much bigger effect than for few big files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants