Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto registering of module dependencies #1338

Closed
wants to merge 18 commits into from

Conversation

mnjowe
Copy link
Collaborator

@mnjowe mnjowe commented May 13, 2024

@tbhallett , @matt-graham , @tamuri . This PR aims at resolving issue #1325.

In a separate issue and PR I think, we can also discuss implementing the newly added resourcefilepath argument(in simulation object) to the read parameters section of disease modules rather than re-creating a resourcefilepath in each disease module?

@mnjowe mnjowe added this to In progress in PR priorities via automation May 13, 2024
@mnjowe mnjowe linked an issue May 13, 2024 that may be closed by this pull request
@mnjowe mnjowe added framework enhancement New feature or request labels May 13, 2024
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting started on this @mnjowe, this looks like a good first pass! I've made some suggested changes. A lot of these are just minor formatting tweaks and a few cases where I think using different variable names would improve readability.

There are a series of superflous whitespace changes in test_analysis.py that I have suggested reverting - not sure if your editor setting is using a different default indent level - we are generally using 4 spaces? As I think the added test would be better situated in the tests/test_module_dependencies.py test module, it maybe better to just move across the new test function there and then revert the tests/test_analysis.py file to its previous state so it untouched by this PR.

I think it's also worth us thinking about whether we want to continue to have both data_folder and resourcefilepath as argument names referring to the same thing. Both are currently used in code, but as data_folder argument to read_parameters has not be used in practice, I think possibly just sticking to resourcefilepath everywhere for consistency might be better as that will make it clearer to everyone that the argument to Simulation.register has the same role as the current resourcefilepath argument to module initialisers. @tbhallett any thoughts on this?

Comment on lines 80 to 81
get_dependencies: DependencyGetter = get_init_dependencies, data_folder: Path = None, auto_register_modules: bool =
False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_dependencies: DependencyGetter = get_init_dependencies, data_folder: Path = None, auto_register_modules: bool =
False
get_dependencies: DependencyGetter = get_init_dependencies,
data_folder: Optional[Path] = None,
auto_register_modules: bool = False,

For consistency better to put line breaks after each argument and avoid introducing a break between the auto_register_modules argument name and it's default value as it makes it a bit difficult to quickly see these are linked.

Also if an argument is allowed to accept a value of None the type hint should allow for this by using Optional (or equivalently specifying Union[..., None] where ... is the original type hint).

Comment on lines 96 to 97
:param data_folder: resource files folder
:param auto_register_modules: whether to register missing modules or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param data_folder: resource files folder
:param auto_register_modules: whether to register missing modules or not
:param data_folder: Resource files folder.
:param auto_register_modules: Whether to register missing modules or not. Any missing
modules will be registered with default values for their initialiser arguments.

Argument descriptions in the docstring should ideally be full sentences. Also think it's worth adding a note about values used for module arguments to make this explicit.

Comment on lines 131 to 132
module_class = get_module_class_map(set())[dependency](resourcefilepath=data_folder)
module_instance_map.update({dependency: module_class})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module_class = get_module_class_map(set())[dependency](resourcefilepath=data_folder)
module_instance_map.update({dependency: module_class})
module_instance = get_module_class_map(set())[dependency](resourcefilepath=data_folder)
module_instance_map[dependency] = module_instance

While get_module_class_map(set())[dependency] evaluates to a module class, the value returned by calling its initialiser method is a module instance so I think module_instance would be a more descriptive name here and more consistent with naming of module_instance_map.

Also to add a new entry to a dictionary, the more idiomatic way is just to use an indexed assignment statement rather than the update method (which is typically used to add multiple values from another dictionary at once).

To avoid repeatedly calling get_module_class_map here, it would also be better to call this once outside of the for dependency in sorted(dependencies): loop and assign to a variable module_class_map and then reuse this in the snippet above as it doesn't depend on the value of dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the third point, the module get_module_class_map is called once in the for dependency in sorted(dependencies): and that is done within :

                if dependency not in module_instance_map:
                    if auto_register_modules:

This makes its not redundant. It would have been redundant if it was placed just above if statement. What do you think @matt-graham @mnjowe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jkumwenda. Even inside that if statement it is still redundant because it will keep getting initialised at every dependecy in the for loop that's not in the module instance map. @matt-graham is right, we need to initialise that outside the for loop and re-use it within that it statement block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for clarification, We will resolve that and do a quick debug before committing.

@@ -368,7 +369,7 @@ def test_get_parameter_functions(seed):

# Check that the parameter identified exists in the simulation
assert (
name in sim.modules[module].parameters
name in sim.modules[module].parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name in sim.modules[module].parameters
name in sim.modules[module].parameters

Don't think this whitespace change is needed (4 space visual indent is what we would usually use) so suggesting reverting.

@@ -88,7 +89,7 @@ def initialise_simulation(self, sim: Simulation):

# At INFO level
assert (
len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2
len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2
len(output["tlo.methods.dummy"]["_metadata"]["tlo.methods.dummy"]) == 2

Don't think this whitespace change is needed (4 space visual indent is what we would usually use) so suggesting reverting.

Comment on lines +545 to +551
fullmodel(resourcefilepath=resourcefilepath)
+ [
ImprovedHealthSystemAndCareSeekingScenarioSwitcher(
resourcefilepath=resourcefilepath
),
DummyModule(),
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fullmodel(resourcefilepath=resourcefilepath)
+ [
ImprovedHealthSystemAndCareSeekingScenarioSwitcher(
resourcefilepath=resourcefilepath
),
DummyModule(),
]
fullmodel(resourcefilepath=resourcefilepath)
+ [
ImprovedHealthSystemAndCareSeekingScenarioSwitcher(
resourcefilepath=resourcefilepath
),
DummyModule(),
]

Don't think this whitespace change is needed so suggesting reverting.

Comment on lines +557 to +558
"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"
== list(sim.modules.keys())[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"
== list(sim.modules.keys())[0]
"ImprovedHealthSystemAndCareSeekingScenarioSwitcher"
== list(sim.modules.keys())[0]

Don't think this whitespace change is needed so suggesting reverting.

@@ -586,7 +586,7 @@ def test_summarize():
names=("draw", "run"),
),
index=["TimePoint0", "TimePoint1"],
data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000],]),
data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000], ]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000], ]),
data=np.array([[0, 20, 1000, 2000], [0, 20, 1000, 2000],]),

Don't think this whitespace change is needed so suggesting reverting.

@@ -637,7 +637,31 @@ def test_summarize():
pd.DataFrame(
columns=pd.Index(["lower", "mean", "upper"], name="stat"),
index=["TimePoint0", "TimePoint1"],
data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5],]),
data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5], ]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5], ]),
data=np.array([[0.5, 10.0, 19.5], [0.5, 10.0, 19.5],]),

Don't think this whitespace change is needed so suggesting reverting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have to look into my environment configurations. Its adding these spaces automatically. at first I thought they're changes from merging master in my branch BUT no. I will look into this. I will also revert the extra space changes made to this file. Thanks

Comment on lines 646 to 667
def test_auto_register_modules(tmpdir):
""" check module dependencies can be registered automatically """
start_date = Date(2010, 1, 1)
# configure logging
log_config = {
"filename": "LogFile",
"directory": tmpdir,
}
sim = Simulation(start_date=start_date, seed=0, log_config=log_config, data_folder=resourcefilepath)
try:
# try executing the code in this block and go to except block if module dependency error exception is fired

# register modules without their associated dependencies
sim.register(demography.Demography(resourcefilepath=resourcefilepath),
copd.Copd(resourcefilepath=resourcefilepath),
auto_register_modules=True)

except ModuleDependencyError as exception:
# if auto register modules argument is false, there should be a module dependency error exception fired
assert exception
assert exception.__class__ == ModuleDependencyError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a test is a good idea but I'm not sure if this is the most appropriate test module for it to go in. I would say tests/test_module_dependencies.py would probably be the more obvious place? Was the reason for putting this in test_analysis.py as the imagined use case is that this is mainly for performing analyses with the model where the user won't necessarily know or want to worry about the module dependencies?

I think also the test here could do with a bit of changing. Tests should generally check a piece of code adheres to some expected behaviour. At the moment the only assert statements in the test are within the except block which I think should not be hit if the auto_register_modules=True argument is working as intended. I would say it would be better here to for example check if after registering the specified modules that all required dependencies of the copd.Copd module are present. For example something like

    required_dependencies = get_all_required_dependencies(sim.modules["Copd"])
    registered_module_names = set(sim.modules.keys())
    assert required_dependencies <= registed_module_names

It would also be good to check the Simulation.register function raises an exception if auto_register_modules=False and we don't pass in all required dependencies, which it looks like might have been what you were trying to do here? If so rather than wrap in try...except block, it would be better to use the pytest.raises context manager to check a ModuleDependencyError is raised.

I think a separate unit test of the updated topologically_sort_modules function in tlo.dependencies would also be useful which tests the behaviour is as expected when the new data_folder and auto_register_modules arguments are explicitly specified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matt-graham. Part of your comment on the test is what also was also going on in my mind(I wasn't fully satisfied with it). Initially my plan was to use with pytest.raises(ModuleDependencyError). The only challenge is that this fails when the code is not raising any exception(with an error DID NOT RAISE) and I wasn't able to find a way how I can capture that(as in how to assert that no exception was raised)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on where to place this test. I think test dependencies will also be fine. I just noted that there is already a test in there(test_missing_dependency_raises_error_on_register, using dummy modules of course) that checks module dependency and I felt like I should not add another test that to some extent will do the same. Do you want me to extend/update this test with real disease modules?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay - for testing that a code snippet doesn't raise an exception the usual pattern is to just have a test function which calls the relevant function / methods, as if an unhandled exception is encountered this will automatically be interpreted as a test failure. Your current test with the sim.register call moved outside of the try block (and no corresponding except block) would work for this. While testing no error is raised is a useful test, I would also say it would be worth testing if no error happens whether the registered dependencies are as expected using something similar to what I suggested above. The test should then fail either if we get a module dependency error unexpectedly or if the module dependencies are not as expected (but we didn't get an error).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So helpful. Thanks @matt-graham

Ah okay - for testing that a code snippet doesn't raise an exception the usual pattern is to just have a test function which calls the relevant function / methods, as if an unhandled exception is encountered this will automatically be interpreted as a test failure. Your current test with the sim.register call moved outside of the try block (and no corresponding except block) would work for this. While testing no error is raised is a useful test, I would also say it would be worth testing if no error happens whether the registered dependencies are as expected using something similar to what I suggested above. The test should then fail either if we get a module dependency error unexpectedly or if the module dependencies are not as expected (but we didn't get an error).

… the argument is not being implemented in this PR by the way.
@tbhallett
Copy link
Collaborator

I think it's also worth us thinking about whether we want to continue to have both data_folder and resourcefilepath as argument names referring to the same thing. Both are currently used in code, but as data_folder argument to read_parameters has not be used in practice, I think possibly just sticking to resourcefilepath everywhere for consistency might be better as that will make it clearer to everyone that the argument to Simulation.register has the same role as the current resourcefilepath argument to module initialisers. @tbhallett any thoughts on this?

Is this just about the name we'll use? On that I don't have a strong feeling... but I suspect 'resoucefilepath' would have instant recognition for everyone, if we moved it.
Overall, what I was supporting was that I thought it was a good idea to actually use the concept set up in the Simulation for data_folder: i.e. the user passes in the path to resource once (to the Simulation) rather than to each module individually.

@matt-graham
Copy link
Collaborator

Is this just about the name we'll use? On that I don't have a strong feeling... but I suspect 'resoucefilepath' would have instant recognition for everyone, if we moved it. Overall, what I was supporting was that I thought it was a good idea to actually use the concept set up in the Simulation for data_folder: i.e. the user passes in the path to resource once (to the Simulation) rather than to each module individually.

Yep, was just checking there wasn't any reason I was missing for having both data_folder and resourcefilepath as argument names referring to the same thing. Agree that user passing in once rather than individually to each module makes a lot more sense, and I think the recognisability / momentum behind resourcefilepath is a strong reason for using this everywhere.

tdm32 and others added 5 commits May 14, 2024 13:47
* fix failing test

* fix unused import statement

* edit optional dependency in demography.py

* roll back simulation.py

* put kwarg in demography.py

* update test

* roll back incidental change

* factorize calc

* add is_alive

* roll back incidental changes

* make static for clarity

* roll back incidental changes

---------

Co-authored-by: Tim Hallett <39991060+tbhallett@users.noreply.github.com>
…fix one case (#1349)

* Globally disable Pylint E06060 possibly-used-before-assignment rule

* Give more informative error message on invalid arguments
@mnjowe
Copy link
Collaborator Author

mnjowe commented May 28, 2024

Hi @tbhallett and @matt-graham. Below are the next steps I'm taking from all the discussions made on this PR and a conversation we had on slack

  1. Revert all the changes made so as to delete the unnecessary spaces added to the files.
  2. Address all comments made by @matt-graham so far on this PR.
  3. open a new issue(get resourcefilepath from simulation) which will also have the links @matt-graham sent on slack regarding any other places this change might affect outside the usual self.read_parameters() method. This issue will be assigned to a pair programming of @jkumwenda and @thewati

Thanks

… into mnjowe/auto-module-registration

# Conflicts:
#	tests/test_analysis.py
@mnjowe
Copy link
Collaborator Author

mnjowe commented May 29, 2024

@matt-graham. Just realised there have been some commits from other branches added. I guess its a of s a result of caches in my local master branch when I was trying to revert changes to test analyses. I'm thinking of opening a new PR for addressing the linked issue. This PR seems to have been messed up as there are now many files changed.

@mnjowe
Copy link
Collaborator Author

mnjowe commented May 29, 2024

@matt-graham and @tbhallett see PR #1379 for a continuation on this. Thanks

@mnjowe mnjowe closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request framework
Projects
PR priorities
In progress
Development

Successfully merging this pull request may close these issues.

Module Dependency Errors
10 participants