Add support for saving and loading simulation state to / from files #1227

matt-graham · 2023-12-11T17:59:16Z

Potentially resolves #86 though currently this doesn't deal with exposing this functionality with scenarios.

Adds new methods save_to_pickle and load_from_pickle that respectively save the current simulation state to a pickle file and load the simulation state from a pickle file (with the latter being a class method to allow using to directly load a simulation as Simulation.load_from_pickle. Pickling is dealt with by dill as this supports a much wider range of Python objects than built in pickle implementation. dill is added to package dependencies here, but the import is currently wrapped with logic to avoid ImportErrors in environments which do not have dill available (with save_to_pickle and load_from_pickle raising informative exceptions in this case).

The contents of the current Simulation.simulate method have also been factored out in to three separate methods Simulation.initialise, Simulation.run_simulation_to and Simulation.finalise that deal allow initialising, running and finalising the simulation separately, with Simulation.simulate retaining the same behaviour by just calling the three in sequence. This allows simulations to be partially run to an intermediate date before the simulation end date, saved to file and then reloaded and continued.

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

A set of tests that check for consistency of simulations when saving and loading from file, including using to resume a partially run simulation, are also added.

Apologies for the unrelated formatting changes src/tlo/simulation.py, I was using black to autoformat my changes and forgot it would also reformat rest of module - I can revert these bits if it makes a pain to review.

Questions

Instead of save_to_pickle and load_from_pickle and should we just use the names save and load?
Should we explicitly set simulation.output_file to None in Simulation.load_from_pickle to guard against accidental use of previous log file (and potential deadlock issues)?

Avoids deadlock on trying to acquire lock on loaded file handler

Ensures event not lost in partial simulations

Allows use in fixtures with non-function scope

Has side effect of mutating counter

…l tests

Better to be explicit

… times

matt-graham · 2023-12-12T09:53:53Z

Failing test in tests/test_malaria.py is due to #1230 which #1231 should fix

tamuri · 2023-12-12T10:35:54Z

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

Yes, we want new log files when restoring simulation because you'd potentially run many simulations from the same saved simulation. And they'd go in as separate Azure Batch runs, so different directories etc.

tamuri · 2023-12-18T08:50:46Z

I'm working on scenarios using this - will give comments in light of that.

matt-graham · 2024-04-08T10:55:52Z

@tamuri what would be the next steps to work on for this? I think you mentioned there was some issue with non-determinancy in logging your testing with this identified. Did you get anyhere with looking at how to use with scenarios, and is there something for me to pick up there?

tamuri · 2024-05-18T17:58:03Z

I've pushed my changes to scenario.py here, as well as three scenario files and a script to check log output matches.

matt-graham · 2024-05-23T16:20:15Z

I've started to look at differences that arise in logs formed from either a 'full' scenario run without any suspending or resuming, or the merged logs from a pair of suspended and resumed runs (but otherwise identical scenario settings), using the scenario files and script @tamuri added on the branch tamuri/suspend-restore-scenario.

From what I can see so far, most (possibly) of the discrepancies arise from bugs that also effect logging without suspent and resuming, specifically that the columns entry logged in the header message for a first log entry is not consistent with later log entries, because of one or more of

Keys in (dict / dataframe) data logged being in different orders between first and subsequent calls to logger
Keys in (dict / dataframe) data logged on first and subsequent calls to logger differ (this seems to typically be when logging a multi-index series generated by a group-by operation and converted to a dict, where not all combinations of the index are present on each log iteration, for example if grouping on age_years one or more specific age_years values may be missing for any given set of data)
Values (types) associated with data logged on first and subsequent calls differ (this one is very common)

The ordering issues (1) are easy enough to resolve by always sorting the data dictionary by key before logging both the header and value messages.

The non-overlapping dict keys (2) will probably require manual fixing in each case as the current structured logging approach fundamentally relies on entries being alignable with each other.

For the non-constant column types (3) I am not sure what the implication is - often this is for example a value initially with int type being subsequently float, or bool being subsequently NoneType, along with some cases of types swapping between scalar types and lists (mainly in RTI module for the latter).

matt-graham added 16 commits November 14, 2023 13:21

Factor out parts of simulate method

0269ea1

Further refactoring of Simulation

a0a848f

Add methods for saving and loading simulations

7ea292f

Add initial test for simulation saving and loading

3fd5cd3

Factor out and add additional simulation test checks

7e5f666

Explicitly set logger output file when loading from pickle

c2a15c3

Avoids deadlock on trying to acquire lock on loaded file handler

Check next date in event queue before popping

921ab2e

Ensures event not lost in partial simulations

Make pytest seed parameter session scoped

7596fc6

Allows use in fixtures with non-function scope

Don't use next on counter in test check

369ea88

Has side effect of mutating counter

Refactor global constants to fixtures in simulation tests + additiona…

6c1afd8

…l tests

Move logging configuration out of load_from_pickle

e8bd4d8

Better to be explicit

Add test for exception when simulation past end date

775cac1

Add docstrings for new methods

d3ec718

Add errors when running without initialising or initialising multiple…

cc71c01

… times

Add dill to dependencies

a5d7289

Sort imports

2bb4066

matt-graham requested a review from tamuri December 11, 2023 17:59

matt-graham added 2 commits December 11, 2023 18:00

Merge branch 'master' into mmg/refactor-simulate

fc60e46

Fix fenceposting error in simulation end date

97af3b0

matt-graham mentioned this pull request Dec 12, 2023

test_dx_algorithm_for_malaria_outcomes calls simulate on same Simulation instance twice #1230

Closed

Merge branch 'master' into mmg/refactor-simulate

c81a0f3

tamuri mentioned this pull request Dec 15, 2023

Saving to file simulations in a suspended state and resuming #86

Open

matt-graham added 2 commits March 20, 2024 11:02

Merge branch 'master' into mmg/refactor-simulate

1d84be6

Merge branch 'master' into mmg/refactor-simulate

cd1a310

matt-graham mentioned this pull request Jun 3, 2024

Allow users to pass command-line arguments to scenarios #1373

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for saving and loading simulation state to / from files #1227

Add support for saving and loading simulation state to / from files #1227

matt-graham commented Dec 11, 2023 •

edited

matt-graham commented Dec 12, 2023

tamuri commented Dec 12, 2023 •

edited

tamuri commented Dec 18, 2023

matt-graham commented Apr 8, 2024

tamuri commented May 18, 2024

matt-graham commented May 23, 2024

Add support for saving and loading simulation state to / from files #1227

Are you sure you want to change the base?

Add support for saving and loading simulation state to / from files #1227

Conversation

matt-graham commented Dec 11, 2023 • edited

Questions

matt-graham commented Dec 12, 2023

tamuri commented Dec 12, 2023 • edited

tamuri commented Dec 18, 2023

matt-graham commented Apr 8, 2024

tamuri commented May 18, 2024

matt-graham commented May 23, 2024

matt-graham commented Dec 11, 2023 •

edited

tamuri commented Dec 12, 2023 •

edited