Auto registering of module dependencies #1338

mnjowe · 2024-05-13T10:04:07Z

@tbhallett , @matt-graham , @tamuri . This PR aims at resolving issue #1325.

In a separate issue and PR I think, we can also discuss implementing the newly added resourcefilepath argument(in simulation object) to the read parameters section of disease modules rather than re-creating a resourcefilepath in each disease module?

matt-graham

Thanks for getting started on this @mnjowe, this looks like a good first pass! I've made some suggested changes. A lot of these are just minor formatting tweaks and a few cases where I think using different variable names would improve readability.

There are a series of superflous whitespace changes in test_analysis.py that I have suggested reverting - not sure if your editor setting is using a different default indent level - we are generally using 4 spaces? As I think the added test would be better situated in the tests/test_module_dependencies.py test module, it maybe better to just move across the new test function there and then revert the tests/test_analysis.py file to its previous state so it untouched by this PR.

I think it's also worth us thinking about whether we want to continue to have both data_folder and resourcefilepath as argument names referring to the same thing. Both are currently used in code, but as data_folder argument to read_parameters has not be used in practice, I think possibly just sticking to resourcefilepath everywhere for consistency might be better as that will make it clearer to everyone that the argument to Simulation.register has the same role as the current resourcefilepath argument to module initialisers. @tbhallett any thoughts on this?

matt-graham · 2024-05-13T10:45:11Z

src/tlo/dependencies.py

+    get_dependencies: DependencyGetter = get_init_dependencies, data_folder: Path = None, auto_register_modules: bool =
+        False


Suggested change

get_dependencies: DependencyGetter = get_init_dependencies, data_folder: Path = None, auto_register_modules: bool =

False

get_dependencies: DependencyGetter = get_init_dependencies,

data_folder: Optional[Path] = None,

auto_register_modules: bool = False,

For consistency better to put line breaks after each argument and avoid introducing a break between the auto_register_modules argument name and it's default value as it makes it a bit difficult to quickly see these are linked.

Also if an argument is allowed to accept a value of None the type hint should allow for this by using Optional (or equivalently specifying Union[..., None] where ... is the original type hint).

matt-graham · 2024-05-13T10:49:16Z

src/tlo/dependencies.py

+    :param data_folder: resource files folder
+    :param auto_register_modules: whether to register missing modules or not


Suggested change

:param data_folder: resource files folder

:param auto_register_modules: whether to register missing modules or not

:param data_folder: Resource files folder.

:param auto_register_modules: Whether to register missing modules or not. Any missing

modules will be registered with default values for their initialiser arguments.

Argument descriptions in the docstring should ideally be full sentences. Also think it's worth adding a note about values used for module arguments to make this explicit.

matt-graham · 2024-05-13T11:17:32Z

src/tlo/dependencies.py

+                        module_class = get_module_class_map(set())[dependency](resourcefilepath=data_folder)
+                        module_instance_map.update({dependency: module_class})


Suggested change

module_class = get_module_class_map(set())[dependency](resourcefilepath=data_folder)

module_instance_map.update({dependency: module_class})

module_instance = get_module_class_map(set())[dependency](resourcefilepath=data_folder)

module_instance_map[dependency] = module_instance

While get_module_class_map(set())[dependency] evaluates to a module class, the value returned by calling its initialiser method is a module instance so I think module_instance would be a more descriptive name here and more consistent with naming of module_instance_map.

Also to add a new entry to a dictionary, the more idiomatic way is just to use an indexed assignment statement rather than the update method (which is typically used to add multiple values from another dictionary at once).

To avoid repeatedly calling get_module_class_map here, it would also be better to call this once outside of the for dependency in sorted(dependencies): loop and assign to a variable module_class_map and then reuse this in the snippet above as it doesn't depend on the value of dependency.

On the third point, the module get_module_class_map is called once in the for dependency in sorted(dependencies): and that is done within :

if dependency not in module_instance_map: if auto_register_modules:

This makes its not redundant. It would have been redundant if it was placed just above if statement. What do you think @matt-graham @mnjowe

Hi @jkumwenda. Even inside that if statement it is still redundant because it will keep getting initialised at every dependecy in the for loop that's not in the module instance map. @matt-graham is right, we need to initialise that outside the for loop and re-use it within that it statement block.

Ok, thank you for clarification, We will resolve that and do a quick debug before committing.

matt-graham · 2024-05-13T11:26:48Z

tests/test_analysis.py

@@ -368,7 +369,7 @@ def test_get_parameter_functions(seed):

                # Check that the parameter identified exists in the simulation
                assert (
-                    name in sim.modules[module].parameters
+                        name in sim.modules[module].parameters


Suggested change

name in sim.modules[module].parameters

name in sim.modules[module].parameters

Don't think this whitespace change is needed (4 space visual indent is what we would usually use) so suggesting reverting.

matt-graham · 2024-05-13T11:27:18Z

tests/test_analysis.py

@@ -88,7 +89,7 @@ def initialise_simulation(self, sim: Simulation):

    # At INFO level
    assert (
-        len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2
+            len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2


Suggested change

len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2

len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2

Don't think this whitespace change is needed (4 space visual indent is what we would usually use) so suggesting reverting.

matt-graham · 2024-05-13T12:54:13Z

tests/test_analysis.py

+                fullmodel(resourcefilepath=resourcefilepath)
+                + [
+                    ImprovedHealthSystemAndCareSeekingScenarioSwitcher(
+                        resourcefilepath=resourcefilepath
+                    ),
+                    DummyModule(),
+                ]


Suggested change

fullmodel(resourcefilepath=resourcefilepath)

+ [

ImprovedHealthSystemAndCareSeekingScenarioSwitcher(

resourcefilepath=resourcefilepath

),

DummyModule(),

]

fullmodel(resourcefilepath=resourcefilepath)

+ [

ImprovedHealthSystemAndCareSeekingScenarioSwitcher(

resourcefilepath=resourcefilepath

),

DummyModule(),

]

Don't think this whitespace change is needed so suggesting reverting.

matt-graham · 2024-05-13T12:54:31Z

tests/test_analysis.py

+            "ImprovedHealthSystemAndCareSeekingScenarioSwitcher"
+            == list(sim.modules.keys())[0]


Suggested change

"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"

== list(sim.modules.keys())[0]

"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"

== list(sim.modules.keys())[0]

Don't think this whitespace change is needed so suggesting reverting.

matt-graham · 2024-05-13T12:54:54Z

tests/test_analysis.py

@@ -586,7 +586,7 @@ def test_summarize():
            names=("draw", "run"),
        ),
        index=["TimePoint0", "TimePoint1"],
-        data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000],]),
+        data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000], ]),


Suggested change

data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000], ]),

data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000],]),

Don't think this whitespace change is needed so suggesting reverting.

matt-graham · 2024-05-13T12:55:10Z

tests/test_analysis.py

@@ -637,7 +637,31 @@ def test_summarize():
        pd.DataFrame(
            columns=pd.Index(["lower", "mean", "upper"], name="stat"),
            index=["TimePoint0", "TimePoint1"],
-            data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5],]),
+            data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5], ]),


Suggested change

data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5], ]),

data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5],]),

Don't think this whitespace change is needed so suggesting reverting.

I think I have to look into my environment configurations. Its adding these spaces automatically. at first I thought they're changes from merging master in my branch BUT no. I will look into this. I will also revert the extra space changes made to this file. Thanks

matt-graham · 2024-05-13T13:56:52Z

tests/test_analysis.py

+def test_auto_register_modules(tmpdir):
+    """ check module dependencies can be registered automatically """
+    start_date = Date(2010, 1, 1)
+    # configure logging
+    log_config = {
+        "filename": "LogFile",
+        "directory": tmpdir,
+    }
+    sim = Simulation(start_date=start_date, seed=0, log_config=log_config, data_folder=resourcefilepath)
+    try:
+        # try executing the code in this block and go to except block if module dependency error exception is fired
+
+        # register modules without their associated dependencies
+        sim.register(demography.Demography(resourcefilepath=resourcefilepath),
+                     copd.Copd(resourcefilepath=resourcefilepath),
+                     auto_register_modules=True)
+
+    except ModuleDependencyError as exception:
+        # if auto register modules argument is false, there should be a module dependency error exception fired
+        assert exception
+        assert exception.__class__ == ModuleDependencyError
+


Adding a test is a good idea but I'm not sure if this is the most appropriate test module for it to go in. I would say tests/test_module_dependencies.py would probably be the more obvious place? Was the reason for putting this in test_analysis.py as the imagined use case is that this is mainly for performing analyses with the model where the user won't necessarily know or want to worry about the module dependencies?

I think also the test here could do with a bit of changing. Tests should generally check a piece of code adheres to some expected behaviour. At the moment the only assert statements in the test are within the except block which I think should not be hit if the auto_register_modules=True argument is working as intended. I would say it would be better here to for example check if after registering the specified modules that all required dependencies of the copd.Copd module are present. For example something like

required_dependencies = get_all_required_dependencies(sim.modules["Copd"]) registered_module_names = set(sim.modules.keys()) assert required_dependencies <= registed_module_names

It would also be good to check the Simulation.register function raises an exception if auto_register_modules=False and we don't pass in all required dependencies, which it looks like might have been what you were trying to do here? If so rather than wrap in try...except block, it would be better to use the pytest.raises context manager to check a ModuleDependencyError is raised.

I think a separate unit test of the updated topologically_sort_modules function in tlo.dependencies would also be useful which tests the behaviour is as expected when the new data_folder and auto_register_modules arguments are explicitly specified.

Thanks @matt-graham. Part of your comment on the test is what also was also going on in my mind(I wasn't fully satisfied with it). Initially my plan was to use with pytest.raises(ModuleDependencyError). The only challenge is that this fails when the code is not raising any exception(with an error DID NOT RAISE) and I wasn't able to find a way how I can capture that(as in how to assert that no exception was raised)

on where to place this test. I think test dependencies will also be fine. I just noted that there is already a test in there(test_missing_dependency_raises_error_on_register, using dummy modules of course) that checks module dependency and I felt like I should not add another test that to some extent will do the same. Do you want me to extend/update this test with real disease modules?

Ah okay - for testing that a code snippet doesn't raise an exception the usual pattern is to just have a test function which calls the relevant function / methods, as if an unhandled exception is encountered this will automatically be interpreted as a test failure. Your current test with the sim.register call moved outside of the try block (and no corresponding except block) would work for this. While testing no error is raised is a useful test, I would also say it would be worth testing if no error happens whether the registered dependencies are as expected using something similar to what I suggested above. The test should then fail either if we get a module dependency error unexpectedly or if the module dependencies are not as expected (but we didn't get an error).

So helpful. Thanks @matt-graham

Ah okay - for testing that a code snippet doesn't raise an exception the usual pattern is to just have a test function which calls the relevant function / methods, as if an unhandled exception is encountered this will automatically be interpreted as a test failure. Your current test with the sim.register call moved outside of the try block (and no corresponding except block) would work for this. While testing no error is raised is a useful test, I would also say it would be worth testing if no error happens whether the registered dependencies are as expected using something similar to what I suggested above. The test should then fail either if we get a module dependency error unexpectedly or if the module dependencies are not as expected (but we didn't get an error).

… the argument is not being implemented in this PR by the way.

tbhallett · 2024-05-13T14:50:46Z

I think it's also worth us thinking about whether we want to continue to have both data_folder and resourcefilepath as argument names referring to the same thing. Both are currently used in code, but as data_folder argument to read_parameters has not be used in practice, I think possibly just sticking to resourcefilepath everywhere for consistency might be better as that will make it clearer to everyone that the argument to Simulation.register has the same role as the current resourcefilepath argument to module initialisers. @tbhallett any thoughts on this?

Is this just about the name we'll use? On that I don't have a strong feeling... but I suspect 'resoucefilepath' would have instant recognition for everyone, if we moved it.
Overall, what I was supporting was that I thought it was a good idea to actually use the concept set up in the Simulation for data_folder: i.e. the user passes in the path to resource once (to the Simulation) rather than to each module individually.

matt-graham · 2024-05-13T15:01:17Z

Is this just about the name we'll use? On that I don't have a strong feeling... but I suspect 'resoucefilepath' would have instant recognition for everyone, if we moved it. Overall, what I was supporting was that I thought it was a good idea to actually use the concept set up in the Simulation for data_folder: i.e. the user passes in the path to resource once (to the Simulation) rather than to each module individually.

Yep, was just checking there wasn't any reason I was missing for having both data_folder and resourcefilepath as argument names referring to the same thing. Agree that user passing in once rather than individually to each module makes a lot more sense, and I think the recognisability / momentum behind resourcefilepath is a strong reason for using this everywhere.

* fix failing test * fix unused import statement * edit optional dependency in demography.py * roll back simulation.py * put kwarg in demography.py * update test * roll back incidental change * factorize calc * add is_alive * roll back incidental changes * make static for clarity * roll back incidental changes --------- Co-authored-by: Tim Hallett <39991060+tbhallett@users.noreply.github.com>

…fix one case (#1349) * Globally disable Pylint E06060 possibly-used-before-assignment rule * Give more informative error message on invalid arguments

…ent`) instead of `post_apply_hook` (#1361)

…y_pop_growth scenario only expand capabilities to match pop growth from 2019 onwards (#1365)

Co-authored-by: mnjowe <emmanuelmnjowe@gmail.com> Co-authored-by: Asif Tamuri <tamuri@gmail.com>

mnjowe · 2024-05-28T07:57:38Z

Hi @tbhallett and @matt-graham. Below are the next steps I'm taking from all the discussions made on this PR and a conversation we had on slack

Revert all the changes made so as to delete the unnecessary spaces added to the files.
Address all comments made by @matt-graham so far on this PR.
open a new issue(get resourcefilepath from simulation) which will also have the links @matt-graham sent on slack regarding any other places this change might affect outside the usual self.read_parameters() method. This issue will be assigned to a pair programming of @jkumwenda and @thewati

Thanks

…nction.

… into mnjowe/auto-module-registration # Conflicts: # tests/test_analysis.py

mnjowe · 2024-05-29T09:26:58Z

@matt-graham. Just realised there have been some commits from other branches added. I guess its a of s a result of caches in my local master branch when I was trying to revert changes to test analyses. I'm thinking of opening a new PR for addressing the linked issue. This PR seems to have been messed up as there are now many files changed.

mnjowe · 2024-05-29T13:57:29Z

@matt-graham and @tbhallett see PR #1379 for a continuation on this. Thanks

auto registering of module dependencies

8c54ece

mnjowe requested review from tbhallett and matt-graham May 13, 2024 10:04

mnjowe assigned mnjowe and thewati May 13, 2024

mnjowe added this to In progress in PR priorities via automation May 13, 2024

mnjowe linked an issue May 13, 2024 that may be closed by this pull request

Module Dependency Errors #1325

Open

mnjowe added framework enhancement New feature or request labels May 13, 2024

mnjowe added 2 commits May 13, 2024 12:23

fix isort error - sort imports

4206e9d

Merge branch 'master' into mnjowe/auto-module-registration

8171394

matt-graham requested changes May 13, 2024

View reviewed changes

go back to passing empty argument in module's read parameters method.…

3001bf3

… the argument is not being implemented in this PR by the way.

tdm32 and others added 5 commits May 14, 2024 13:47

Globally disable Pylint E0606 possibly-used-before-assignment rule + …

f7cf84c

…fix one case (#1349) * Globally disable Pylint E06060 possibly-used-before-assignment rule * Give more informative error message on invalid arguments

The Equipment Class (#1098)

8b0698d

Update consumables in cancer modules (#1353)

4f561a5

update url (#1354)

a95e916

thewati assigned jkumwenda May 17, 2024

giordano and others added 7 commits May 17, 2024 17:48

[CI] Change label of runners for profiling workflows (#1358)

67c8058

HSI_Event uses separate private member function (`_run_after_hsi_ev…

49f42d0

…ent`) instead of `post_apply_hook` (#1361)

Generate and add equipment availability data (#1329)

54c3eb1

adapt use of do_scaling to behave as user expects. (#1345)

628f54a

Since HRH capabilities are calibrated to 2018, make default scaling_b…

de94ffa

…y_pop_growth scenario only expand capabilities to match pop growth from 2019 onwards (#1365)

Changed module instance, docstring

2299bf6

custom_log not behaving as expected (#1157)

6494277

Co-authored-by: mnjowe <emmanuelmnjowe@gmail.com> Co-authored-by: Asif Tamuri <tamuri@gmail.com>

Initailised get_module_class_map m to avoid repeatedly calling the fu…

a0a50d7

…nction.

Merge remote-tracking branch 'origin/mnjowe/auto-module-registration'…

809f3c2

… into mnjowe/auto-module-registration # Conflicts: # tests/test_analysis.py

mnjowe closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto registering of module dependencies #1338

Auto registering of module dependencies #1338

mnjowe commented May 13, 2024

matt-graham left a comment

matt-graham May 13, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

jkumwenda May 24, 2024

mnjowe May 28, 2024

jkumwenda May 28, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

matt-graham May 13, 2024

mnjowe May 14, 2024

matt-graham May 13, 2024

mnjowe May 13, 2024

mnjowe May 13, 2024

matt-graham May 13, 2024

mnjowe May 14, 2024

tbhallett commented May 13, 2024

matt-graham commented May 13, 2024

mnjowe commented May 28, 2024 •

edited

mnjowe commented May 29, 2024

mnjowe commented May 29, 2024

		get_dependencies: DependencyGetter = get_init_dependencies, data_folder: Path = None, auto_register_modules: bool =
		False

		:param data_folder: resource files folder
		:param auto_register_modules: whether to register missing modules or not

-    :param data_folder: resource files folder
-    :param auto_register_modules: whether to register missing modules or not
+    :param data_folder: Resource files folder.
+    :param auto_register_modules: Whether to register missing modules or not. Any missing
+        modules will be registered with default values for their initialiser arguments.

		module_class = get_module_class_map(set())[dependency](resourcefilepath=data_folder)
		module_instance_map.update({dependency: module_class})

	name in sim.modules[module].parameters
	name in sim.modules[module].parameters

	len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2
	len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2

		"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"
		== list(sim.modules.keys())[0]

	data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000], ]),
	data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000],]),

	data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5], ]),
	data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5],]),

Auto registering of module dependencies #1338

Auto registering of module dependencies #1338

Conversation

mnjowe commented May 13, 2024

matt-graham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbhallett commented May 13, 2024

matt-graham commented May 13, 2024

mnjowe commented May 28, 2024 • edited

mnjowe commented May 29, 2024

mnjowe commented May 29, 2024

mnjowe commented May 28, 2024 •

edited