New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reduce rereading cached files during init #641
base: main
Are you sure you want to change the base?
feat: reduce rereading cached files during init #641
Conversation
…d add unit tests. (#354)
The checksum of the file. | ||
When the filepath doesn't exist or the filepath isn't a file, None. | ||
""" | ||
if not filepath.exists() or not filepath.is_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you chose to return None
instead of raising an error?
return FileChecksumCalculator._calculate_sha256_checksum(filepath) | ||
|
||
@staticmethod | ||
def _calculate_sha256_checksum(filepath: Path) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a MD5 checksum be enough to check if there are differences? MD5 is faster and we do not need the cryptographic security SHA256 offers :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed. We will change it to MD5.
CachedPathFileModelData provides a simple structure to keep the Filemodel and checksum together. | ||
""" | ||
|
||
_model: "FileModel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think these attribute declarations are required
""" "Checksum of the file the filemodel is based on.""" | ||
return self._checksum | ||
|
||
def __init__(self, model: "FileModel", checksum: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation
@@ -403,6 +404,29 @@ def pop_last_parent(self) -> None: | |||
self._anchors.pop() | |||
|
|||
|
|||
class CachedPathFileModelData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name confuses me. Maybe just use CachedFileModel
.
@@ -422,7 +446,10 @@ def retrieve_model(self, path: Path) -> Optional["FileModel"]: | |||
The FileModel associated with the Path if it has been registered | |||
before, otherwise None. | |||
""" | |||
return self._cache_dict.get(path, None) | |||
file_model = self._cache_dict.get(path, None) | |||
if file_model is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are redundant. If the given path
is not in the cache, file_model
is None
and will be returned on line 452.
@@ -441,6 +469,38 @@ def is_empty(self) -> bool: | |||
""" | |||
return not any(self._cache_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also work: return not self._cache_dict
. Knowing that, this function may not be needed since the statement is so simple.
@@ -441,6 +469,38 @@ def is_empty(self) -> bool: | |||
""" | |||
return not any(self._cache_dict) | |||
|
|||
def exists(self, path: Path) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be made private.
checksum = self._get_checksum(path) | ||
return checksum != self._cache_dict.get(path).checksum | ||
|
||
def _get_checksum(self, path: Path) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to make calculate_checksum()
return an optional string, then also make this return value an Optional[str]
hydrolib/core/basemodel.py
Outdated
if context.is_content_changed(filepath): | ||
data = self._load(loading_path) | ||
context.register_model(filepath, self) | ||
data["filepath"] = filepath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the filepath be set regardless of whether the data has been cached or not? Another question: If we find data in the cache, do we still need to do all the other steps below? If a file is referenced multiple times, do we need separate instances of models, or should they refer to the same instance?
If context.retrieve_model(filepath)
returns something (i.e. the filepath is in the cache), data
is a FileModel
. If the model is not cached, data
will be a dictionary.
I think we should discuss this.
Quality Gate passedIssues Measures |
#354
Note for reviewer
Im not fully sure how the @contextmanager functions.
It seems to create new instance for caching for each model which is initialized.
I could not setup a good test to test these steps:
Reading a file --> updating and changing the model it is read to --> saving the file --> reading the same file, changed data is read, since the file has been updated while the application is running.
Since there is not any other way to read the file as a user except for via the ctor.
The _load method could be used, but this is "private" and should not be used by the user.
This method also does not make use of the caching.