Keep individual timeseries files #228

nmerket · 2021-05-03T20:10:17Z

Fixes #182.

Pull Request Description

Moves the eagle.postprocessing.keep_intermediate_files to postprocessing.keep_individual_timeseries and changes behavior to keep only the timeseries parquet files. Also, removes the deprecated aggregate_timeseries key as that aggregation always happens.

Checklist

Not all may apply

Code changes (must work)
Tests exercising your feature/bug fix (check coverage report on CircleCI build -> Artifacts)
All other unit tests passing
Update validation for project config yaml file changes
Update existing documentation
Run a small batch run to make sure it all works (local is fine, unless an Eagle specific feature)
Add to the changelog_dev.rst file and propose migration text in the pull request

[skip ci]

rHorsey

lgtm - one question which is outside the scope of this PR but still relevant to the key function at issue. Happy to approve @nmerket if this is a priority to get merged!

rHorsey · 2021-05-03T22:34:42Z

buildstockbatch/postprocessing.py

    results_job_json_glob = f'{sim_output_dir}/results_job*.json.gz'
-    logger.info('Removing temporary files')
-    fs.rm(ts_in_dir, recursive=True)
+    logger.info('Removing results_job*.json.gz')
    for filename in fs.glob(results_job_json_glob):


@nmerket In the past this function would run even if there was an uncaught error in the metadata results aggregation, leading to the entire analysis having to be rerun. Is it possible still for this code to be reached if that dask cluster job doesn't complete successfully?

It's possible. I didn't do anything in this PR to mitigate that possibility. In general, this should run only after the results.csv aggregation is complete. Is there a way you could provide a minimum verifiable example of this issue?

And the scope of this PR is to leave the timeseries files on disk, not the results jsons.

nmerket added 5 commits May 3, 2021 12:08

removing deprecated aggregate_timeseries

6198e5b

changed delete functionality and added test

543339d

changing key name to keep_individual_timeseries

94fd99b

adding item to changelog

42a8e17

updating docs

2264a02

nmerket requested review from asparke2 and rHorsey May 3, 2021 20:10

changing to correct PR number

eeed61c

[skip ci]

rHorsey reviewed May 3, 2021

View reviewed changes

Merge branch 'develop' into keep_intermediate_files

1330ed3

nmerket merged commit 54b84bf into develop May 4, 2021

nmerket deleted the keep_intermediate_files branch May 4, 2021 17:03

asparke2 mentioned this pull request May 5, 2021

Note deprecation of aggregate_timeseries key in documentation #175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep individual timeseries files #228

Keep individual timeseries files #228

nmerket commented May 3, 2021 •

edited

rHorsey left a comment

rHorsey May 3, 2021

nmerket May 3, 2021

nmerket May 3, 2021

Keep individual timeseries files #228

Keep individual timeseries files #228

Conversation

nmerket commented May 3, 2021 • edited

Pull Request Description

Checklist

rHorsey left a comment

Choose a reason for hiding this comment

rHorsey May 3, 2021

Choose a reason for hiding this comment

nmerket May 3, 2021

Choose a reason for hiding this comment

nmerket May 3, 2021

Choose a reason for hiding this comment

nmerket commented May 3, 2021 •

edited