feat: reduce rereading cached files during init #641

MRVermeulenDeltares · 2024-05-08T14:10:13Z

The existing caching should be expanded to also be used with initialization reading, instead of only newing objects.
The existing caching should be expanded to verify if a file has changed before returning the cache data.
When the file has changed a new cache of the file should be made.

Note for reviewer

Im not fully sure how the @contextmanager functions.
It seems to create new instance for caching for each model which is initialized.

I could not setup a good test to test these steps:
Reading a file --> updating and changing the model it is read to --> saving the file --> reading the same file, changed data is read, since the file has been updated while the application is running.
Since there is not any other way to read the file as a user except for via the ctor.
The _load method could be used, but this is "private" and should not be used by the user.
This method also does not make use of the caching.

…ading stress. (#354)

…d add unit tests. (#354)

tim-vd-aardweg · 2024-05-13T07:55:00Z

hydrolib/core/utils.py

+                The checksum of the file.
+                When the filepath doesn't exist or the filepath isn't a file, None.
+        """
+        if not filepath.exists() or not filepath.is_file():


Is there a reason you chose to return None instead of raising an error?

tim-vd-aardweg · 2024-05-13T07:56:38Z

hydrolib/core/utils.py

+        return FileChecksumCalculator._calculate_sha256_checksum(filepath)
+
+    @staticmethod
+    def _calculate_sha256_checksum(filepath: Path) -> str:


Shouldn't a MD5 checksum be enough to check if there are differences? MD5 is faster and we do not need the cryptographic security SHA256 offers :p

Discussed. We will change it to MD5.

tim-vd-aardweg · 2024-05-13T08:03:11Z

hydrolib/core/basemodel.py

+    CachedPathFileModelData provides a simple structure to keep the Filemodel and checksum together.
+    """
+
+    _model: "FileModel"


I dont think these attribute declarations are required

tim-vd-aardweg · 2024-05-13T08:03:48Z

hydrolib/core/basemodel.py

+        """ "Checksum of the file the filemodel is based on."""
+        return self._checksum
+
+    def __init__(self, model: "FileModel", checksum: str) -> None:


Missing documentation

tim-vd-aardweg · 2024-05-13T08:05:59Z

hydrolib/core/basemodel.py

@@ -403,6 +404,29 @@ def pop_last_parent(self) -> None:
            self._anchors.pop()


+class CachedPathFileModelData:


This name confuses me. Maybe just use CachedFileModel.

tim-vd-aardweg · 2024-05-13T08:14:33Z

hydrolib/core/basemodel.py

@@ -422,7 +446,10 @@ def retrieve_model(self, path: Path) -> Optional["FileModel"]:
                The FileModel associated with the Path if it has been registered
                before, otherwise None.
        """
-        return self._cache_dict.get(path, None)
+        file_model = self._cache_dict.get(path, None)
+        if file_model is None:


These two lines are redundant. If the given path is not in the cache, file_model is None and will be returned on line 452.

tim-vd-aardweg · 2024-05-13T08:20:17Z

hydrolib/core/basemodel.py

@@ -441,6 +469,38 @@ def is_empty(self) -> bool:
        """
        return not any(self._cache_dict)


This should also work: return not self._cache_dict. Knowing that, this function may not be needed since the statement is so simple.

tim-vd-aardweg · 2024-05-13T08:21:58Z

hydrolib/core/basemodel.py

@@ -441,6 +469,38 @@ def is_empty(self) -> bool:
        """
        return not any(self._cache_dict)

+    def exists(self, path: Path) -> bool:


Can be made private.

tim-vd-aardweg · 2024-05-13T08:24:08Z

hydrolib/core/basemodel.py

+        checksum = self._get_checksum(path)
+        return checksum != self._cache_dict.get(path).checksum
+
+    def _get_checksum(self, path: Path) -> str:


If you decide to make calculate_checksum() return an optional string, then also make this return value an Optional[str]

tim-vd-aardweg · 2024-05-13T09:39:44Z

hydrolib/core/basemodel.py

+            if context.is_content_changed(filepath):
+                data = self._load(loading_path)
+                context.register_model(filepath, self)
+                data["filepath"] = filepath


Shouldn't the filepath be set regardless of whether the data has been cached or not? Another question: If we find data in the cache, do we still need to do all the other steps below? If a file is referenced multiple times, do we need separate instances of models, or should they refer to the same instance?

If context.retrieve_model(filepath) returns something (i.e. the filepath is in the cache), data is a FileModel. If the model is not cached, data will be a dictionary.

I think we should discuss this.

…nit. (#354)

sonarcloud · 2024-05-23T14:12:41Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

MRVermeulenDeltares and others added 5 commits May 8, 2024 10:18

feat: update caching to check if file has already cached to reduce lo…

a06570c

…ading stress. (#354)

feat: update caching module with verifying if the file has changed an…

be0324d

…d add unit tests. (#354)

autoformat: isort & black

0d00cac

feat: Add tests TestFileLoadContextReusingCachedFilesDuringInit (#354)

7476c4b

autoformat: isort & black

70e6d99

MRVermeulenDeltares changed the title ~~reduce rereading cached files during init~~ feat: reduce rereading cached files during init May 8, 2024

MRVermeulenDeltares and others added 3 commits May 8, 2024 16:59

feat: update documentation (#354)

9af6db5

autoformat: isort & black

b285b55

feat: Resolve codesmell (#354)

3180329

MRVermeulenDeltares linked an issue May 8, 2024 that may be closed by this pull request

ExtModel reads some forcings multiple times #354

Open

tim-vd-aardweg reviewed May 13, 2024

View reviewed changes

MRVermeulenDeltares and others added 3 commits May 23, 2024 16:09

feat: Update basemodel to use caching via new instead of caching in i…

e5ab9e1

…nit. (#354)

feat: Update basemodel to use caching via new instead of caching in i…

652959d

…nit. (#354)

autoformat: isort & black

a5feedb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reduce rereading cached files during init #641

feat: reduce rereading cached files during init #641

MRVermeulenDeltares commented May 8, 2024 •

edited

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 23, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

tim-vd-aardweg May 13, 2024

sonarcloud bot commented May 23, 2024

		@@ -403,6 +404,29 @@ def pop_last_parent(self) -> None:
		self._anchors.pop()


		class CachedPathFileModelData:

		@@ -441,6 +469,38 @@ def is_empty(self) -> bool:
		"""
		return not any(self._cache_dict)

feat: reduce rereading cached files during init #641

Are you sure you want to change the base?

feat: reduce rereading cached files during init #641

Conversation

MRVermeulenDeltares commented May 8, 2024 • edited

Note for reviewer

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented May 23, 2024

Quality Gate passed

MRVermeulenDeltares commented May 8, 2024 •

edited