-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference file writing #198
Comments
Fixing these things in https://github.com/HopkinsIDD/flepiMoP/tree/breaking-improvements-fixsaving From Shaun: We need to do a couple things, some already started:
|
From Alison some more notes:
Include a toggle controlled by an input argument to turn this saving on or off |
blocks are an automatically scheduled resume. I agree we should try to keep that (not easy, not so useful on slurm clusters)
Do you mean always reading the first index (00000001 or 0000000) instead of following up the index ?
Then it'll run 400 iterations of emcee without writing anything on disk but an HDF5 dataset of the chain (just the accepted parameters that are fitted, very compact -- updated in a crash-safe way at every iteration). Then it produces 100 samples that it writes fully to disk (seir, hosp and all). Since in this example there are 256 slots it will randomly choose some of them. But if I had asked for 1000 samples instead, it would have produced 256 samples from the last iteration (of all 256 slots), and the next 256 samples would be taken from the fifth last (cause thin=5, to avoid too correlated samples) iteration, then the tenth last iteration and all. I am also planning on making it compatible with our scheme. This is very convenient because I can stop and restart the run as I want/when it crashes. We never got around to doing it with our classical scheme. It's also intuitive to specify the number of samples and to plot from a single file. Now for our old inference -- I think we should distinguish the file operation needed to communicate between gempyor and R, and the one to save a chain on disk. It's very confusing how conflated these are at the moment.
Do we have the machinery to fit initial condition on the inference side ? |
Some work on that in #199 |
The main issues here re inference file saving are fixed with #205. The later discussion about how/where gempyor saves files and where R looks for them hasn't been addressed but is not the main part of this issue anyways |
Inference is writing so many files for each iteration that it uses up too much space. This causes issues when running batch runs in particular as space fills up.
We should add an option to not save all these iterations if we want. Not sure if there's a way to still get all the inference information we want to see evolution of parameters or likelihoods, but limit the number of files, or if we just want to add the option to turn this off and not worry about knowing this information.
The text was updated successfully, but these errors were encountered: