Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensuring cache presence before starting borg create. Fixes #7450 #7475

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Michael-Girma
Copy link
Contributor

@Michael-Girma Michael-Girma commented Mar 25, 2023

@ThomasWaldmann These are the changes for #7450. Right now, it recreates the config-cache if not found, and recreates the chunk-cache and the config-cache(to remove integrity data), if the chunk-cache is not found. Think there is a better/more efficient way to do this?

@codecov-commenter
Copy link

codecov-commenter commented Mar 25, 2023

Codecov Report

Merging #7475 (6ab641a) into master (d25465c) will increase coverage by 0.03%.
The diff coverage is 95.45%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #7475      +/-   ##
==========================================
+ Coverage   83.90%   83.93%   +0.03%     
==========================================
  Files          67       67              
  Lines       11740    11762      +22     
  Branches     2139     2146       +7     
==========================================
+ Hits         9850     9872      +22     
+ Misses       1321     1319       -2     
- Partials      569      571       +2     
Impacted Files Coverage Δ
src/borg/cache.py 85.73% <95.45%> (+0.27%) ⬆️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR - some stuff i noticed.

@@ -900,6 +909,13 @@ def create_master_idx(chunk_idx):
self.do_cache = os.path.isdir(archive_path)
self.chunks = create_master_idx(self.chunks)

def check_cache_integrity(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only tests existence, not integrity.

how about:

return all(os.path.exists(os.path.join(self.path, file)) for file in "chunks", "config")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: only called from one place.

maybe rather do the joins only once and have the rest in-line?

@@ -514,6 +514,15 @@ def __init__(
self.txn_active = False

self.path = cache_dir(self.repository, path)
if os.path.exists(self.path) and not self.check_cache_integrity():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the first term redundant as the check call would also return False in that case?

@@ -514,6 +514,15 @@ def __init__(
self.txn_active = False

self.path = cache_dir(self.repository, path)
if os.path.exists(self.path) and not self.check_cache_integrity():
logger.warning("Rebuilding cache as some cache-files have gone missing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this should be more precise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: no "-" in english.

Comment on lines +521 to +522
self.chunks = ChunkIndex()
self.chunks.write(os.path.join(self.path, "chunks"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will create an empty (and thus incorrect) chunks index on disk, how is that helpful?

also creation of that file is done in a rather unsafe way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was assuming that creating an empty chunks cache should be enough for borg create to continue and re-populate it as it normally would.

  1. Should we be entirely rebuilding it here?
  2. How else would be creating the file. Is there another piece of code that I can take a look at?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borg create needs the chunks cache in a valid state (refcounts, sizes for all chunks in the repo) and updates refcounts on that basis.

@@ -921,6 +937,15 @@ def wipe_cache(self):
self.chunks = ChunkIndex()
with SaveFile(os.path.join(self.path, files_cache_name()), binary=True):
pass # empty file
self.recreate_cache_config()

def recreate_cache_config(self, delete_existing=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this create... - it only mutates to the more special recreate_... in you give delete_existing=True.

info = json.loads(self.cmd(f"--repo={self.repository_location}", "rinfo", "--json"))
repository = info["repository"]
cache_dir = os.path.join(get_cache_dir(), repository["id"])
config_cache_path = os.path.join(cache_dir, "config")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the cache config, not the config cache.

repository = info["repository"]
cache_dir = os.path.join(get_cache_dir(), repository["id"])
config_cache_path = os.path.join(cache_dir, "config")
chunk_cache_path = os.path.join(cache_dir, "chunks")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunks_cache_path

self.cmd(f"--repo={self.repository_location}", "create", "archive1", "input")
info = json.loads(self.cmd(f"--repo={self.repository_location}", "rinfo", "--json"))
repository = info["repository"]
cache_dir = os.path.join(get_cache_dir(), repository["id"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo_cache_dir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants