Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reduce rereading cached files during init #641

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

MRVermeulenDeltares
Copy link
Contributor

@MRVermeulenDeltares MRVermeulenDeltares commented May 8, 2024

#354

  • The existing caching should be expanded to also be used with initialization reading, instead of only newing objects.
  • The existing caching should be expanded to verify if a file has changed before returning the cache data.
  • When the file has changed a new cache of the file should be made.

Note for reviewer

Im not fully sure how the @contextmanager functions.
It seems to create new instance for caching for each model which is initialized.

I could not setup a good test to test these steps:
Reading a file --> updating and changing the model it is read to --> saving the file --> reading the same file, changed data is read, since the file has been updated while the application is running.
Since there is not any other way to read the file as a user except for via the ctor.
The _load method could be used, but this is "private" and should not be used by the user.
This method also does not make use of the caching.

@MRVermeulenDeltares MRVermeulenDeltares changed the title reduce rereading cached files during init feat: reduce rereading cached files during init May 8, 2024
@MRVermeulenDeltares MRVermeulenDeltares linked an issue May 8, 2024 that may be closed by this pull request
The checksum of the file.
When the filepath doesn't exist or the filepath isn't a file, None.
"""
if not filepath.exists() or not filepath.is_file():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you chose to return None instead of raising an error?

return FileChecksumCalculator._calculate_sha256_checksum(filepath)

@staticmethod
def _calculate_sha256_checksum(filepath: Path) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a MD5 checksum be enough to check if there are differences? MD5 is faster and we do not need the cryptographic security SHA256 offers :p

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. We will change it to MD5.

CachedPathFileModelData provides a simple structure to keep the Filemodel and checksum together.
"""

_model: "FileModel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think these attribute declarations are required

""" "Checksum of the file the filemodel is based on."""
return self._checksum

def __init__(self, model: "FileModel", checksum: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation

@@ -403,6 +404,29 @@ def pop_last_parent(self) -> None:
self._anchors.pop()


class CachedPathFileModelData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name confuses me. Maybe just use CachedFileModel.

@@ -422,7 +446,10 @@ def retrieve_model(self, path: Path) -> Optional["FileModel"]:
The FileModel associated with the Path if it has been registered
before, otherwise None.
"""
return self._cache_dict.get(path, None)
file_model = self._cache_dict.get(path, None)
if file_model is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are redundant. If the given path is not in the cache, file_model is None and will be returned on line 452.

@@ -441,6 +469,38 @@ def is_empty(self) -> bool:
"""
return not any(self._cache_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also work: return not self._cache_dict. Knowing that, this function may not be needed since the statement is so simple.

@@ -441,6 +469,38 @@ def is_empty(self) -> bool:
"""
return not any(self._cache_dict)

def exists(self, path: Path) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be made private.

checksum = self._get_checksum(path)
return checksum != self._cache_dict.get(path).checksum

def _get_checksum(self, path: Path) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to make calculate_checksum() return an optional string, then also make this return value an Optional[str]

if context.is_content_changed(filepath):
data = self._load(loading_path)
context.register_model(filepath, self)
data["filepath"] = filepath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the filepath be set regardless of whether the data has been cached or not? Another question: If we find data in the cache, do we still need to do all the other steps below? If a file is referenced multiple times, do we need separate instances of models, or should they refer to the same instance?

If context.retrieve_model(filepath) returns something (i.e. the filepath is in the cache), data is a FileModel. If the model is not cached, data will be a dictionary.

I think we should discuss this.

Copy link

sonarcloud bot commented May 23, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExtModel reads some forcings multiple times
2 participants