New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep individual timeseries files #228
Conversation
[skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - one question which is outside the scope of this PR but still relevant to the key function at issue. Happy to approve @nmerket if this is a priority to get merged!
results_job_json_glob = f'{sim_output_dir}/results_job*.json.gz' | ||
logger.info('Removing temporary files') | ||
fs.rm(ts_in_dir, recursive=True) | ||
logger.info('Removing results_job*.json.gz') | ||
for filename in fs.glob(results_job_json_glob): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmerket In the past this function would run even if there was an uncaught error in the metadata results aggregation, leading to the entire analysis having to be rerun. Is it possible still for this code to be reached if that dask cluster job doesn't complete successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible. I didn't do anything in this PR to mitigate that possibility. In general, this should run only after the results.csv aggregation is complete. Is there a way you could provide a minimum verifiable example of this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the scope of this PR is to leave the timeseries files on disk, not the results jsons.
Fixes #182.
Pull Request Description
Moves the
eagle.postprocessing.keep_intermediate_files
topostprocessing.keep_individual_timeseries
and changes behavior to keep only the timeseries parquet files. Also, removes the deprecatedaggregate_timeseries
key as that aggregation always happens.Checklist
Not all may apply