upload-buildstockcsv-to-s3 #365

yingli-NREL · 2023-04-19T02:55:43Z

Modified postprocessing.py def upload_results() to upload buildstockcsv to s3

Checklist

Not all may apply

Code changes (must work)
Tests exercising your feature/bug fix (check coverage report on Checks -> BuildStockBatch Tests -> Artifacts)
Coverage has increased or at least not decreased. Update minimum_coverage in .github/workflows/ci.yml as necessary.
All other unit and integration tests passing
Update validation for project config yaml file changes
Update existing documentation
Run a small batch run on Eagle to make sure it all works if you made changes that will affect Eagle
Add to the changelog_dev.rst file and propose migration text in the pull request

github-actions · 2023-04-19T03:08:15Z

File	Coverage
All files	`84%`	✅
base.py	`89%`	✅
eagle.py	`79%`	✅
exc.py	`57%`	✅
local.py	`50%`	✅
postprocessing.py	`84%`	✅
utils.py	`96%`	✅
sampler/base.py	`79%`	✅
sampler/downselect.py	`33%`	✅
sampler/precomputed.py	`93%`	✅
sampler/residential_quota.py	`61%`	✅
test/test_docker.py	`33%`	✅
test/test_validation.py	`97%`	✅
workflow_generator/base.py	`90%`	✅
workflow_generator/commercial.py	`53%`	✅
workflow_generator/residential_hpxml.py	`84%`	✅

Minimum allowed coverage is 33%

Generated by 🐒 cobertura-action against 621e574

nmerket · 2023-04-21T17:34:03Z

@yingli-NREL I added the pull request checklist back in. What you need to do is make sure you've done each item in that list (or verified that it wasn't applicable). If you're not sure what some of those mean, @rajee or I can help.

yingli-NREL · 2023-05-03T16:18:06Z

create one folder named "buildstock_csv" in the S3 results directory and add buildstock.csv in the folder "buildstock_csv"

…ockcsv-to-s3-during-postprocessing

nmerket

I apologize for taking so long to properly review this. Have you run a small batch to verify this is working? This could use some unit testing, which is kind of tricky with s3 stuff because you'll probably need to mock things. There's some existing tests I linked to that could be added to. A couple other notes below. I know it is a lot, but I've been thorough and somewhat critical hopefully as a teaching tool. If it would help, we can meet to talk through some of this.

nmerket · 2023-05-12T22:03:23Z

buildstockbatch/postprocessing.py

+
+    def upload_buildstock_csv(filepath):
+        full_path = buildstock_dir.joinpath(filepath)
+        s3 = boto3.resource('s3')
+        bucket = s3.Bucket(s3_bucket)
+        s3_prefix_output_new = s3_prefix_output+ '/' + 'buildstock_csv' + '/'
+        s3key = Path(s3_prefix_output_new).joinpath(filepath).as_posix()
+        bucket.upload_file(str(full_path), str(s3key))


When I look at the coverage report, this doesn't seem to be getting called. It's not clear to me why.

It would make sense to add something to this test to check if the file got uploaded. It's using mock to not actually upload file to s3 but to see that it was called successfully.

One more thought on this, could we not have upload_buildstock_csv here? It seems to mostly be a copy of the more generic upload_file but puts the file somewhere else. Maybe there could be a second, optional argument to upload_file where you could specify the s3 location when it's different than the default and then just use it for buildstock.csv.

I created a local environment in my HPC account and used that environment to run a small batch. It works. The results are upload in Amazon S3>Buckets>eulp>buildstock_csv_to_s3/>test/>test10/

nmerket · 2023-05-12T22:12:48Z

buildstockbatch/postprocessing.py

+    buildstock_csv = []   
+    for file in buildstock_dir.glob('buildstock.csv'):
+        buildstock_csv.append(file.relative_to(buildstock_dir))


I see why you made this a list so you could use map below, but that's not necessary. You can just pull the filename here. We should also only upload it if it exists. I think ComStock might put this file somewhere different, so it may not always be where you're looking for it.

Suggested change

buildstock_csv = []

for file in buildstock_dir.glob('buildstock.csv'):

buildstock_csv.append(file.relative_to(buildstock_dir))

buildstock_csv = buildstock_dir / 'buildstock.csv'

I haven't tested that this works. I might be missing some normalization or something.

nmerket · 2023-05-12T22:15:42Z

buildstockbatch/postprocessing.py

    dask.compute(map(dask.delayed(upload_file), all_files))
+    dask.compute(map(dask.delayed(upload_buildstock_csv), buildstock_csv))


It's best to gather all the work you want to do into one list of tasks and then send that to dask.compute once. You don't have to call dask.delayed with a map, it was just convenient in this case. See dask.delayed documentation for more details. This change depends on buildstock_csv being a filename, not a list with just one item in it.

Suggested change

dask.compute(map(dask.delayed(upload_file), all_files))

dask.compute(map(dask.delayed(upload_buildstock_csv), buildstock_csv))

tasks = list(map(dask.delayed(upload_file), all_files))

tasks.append(dask.delayed(upload_buildstock_csv)(buildstock_csv))

dask.compute(tasks)

…tprocessing

…ockcsv-to-s3-during-postprocessing

…tprocessing

nmerket · 2023-10-06T23:20:07Z

The buildstock.csv is in a different place depending on whether you're running buildstock_eagle or buildstock_local. When I run buildstock_local and have the output_directory not in the default location, it fails to upload:

INFO:2023-10-06 17:06:19:buildstockbatch.postprocessing:Uploading the parquet files to s3
2023-10-06 17:06:20,315 - distributed.worker - WARNING - Compute Failed
Key:       upload_buildstock_csv-3a9eafb5-350b-4d0d-ad8d-efae94791c46
Function:  upload_buildstock_csv
args:      ('')
kwargs:    {}
Exception: "FileNotFoundError(2, 'No such file or directory')"

I can take a look at why next week.

nmerket

A few more notes. I'll try to get this working early next week.

nmerket · 2023-10-06T23:21:53Z

buildstockbatch/postprocessing.py

@@ -595,6 +595,7 @@ def remove_intermediate_files(fs, results_dir, keep_individual_timeseries=False)
 def upload_results(aws_conf, output_dir, results_dir):
    logger.info("Uploading the parquet files to s3")

+    buildstock_dir = Path(results_dir).parent.joinpath('housing_characteristics')


I think this is the problem right here. It's assuming that file is in the usual place it is on Eagle. I think the sampler returns the buildstock.csv location. We should catch that and use that file. Could be a bit tricky because the sample is called way before the postprocessing on a different node and everything.

nmerket · 2023-10-06T23:22:18Z

buildstockbatch/postprocessing.py

+    buildstock_csv = ''
+    for file in buildstock_dir.glob('buildstock.csv'):
+        buildstock_csv = file.relative_to(buildstock_dir)


Also, this is a little bizarre to use a for loop and glob to look for one file.

…ockcsv-to-s3-during-postprocessing

nmerket

I confirmed it's working for both the local version and on Eagle.

upload-buildstockcsv-to-s3

4962bc6

yingli-NREL requested a review from rajeee April 19, 2023 02:55

yingli-NREL linked an issue Apr 19, 2023 that may be closed by this pull request

Upload buildstock.csv to S3 during postprocessing #348

Closed

yingli-NREL marked this pull request as ready for review May 2, 2023 17:46

Yingli Lou added 2 commits May 3, 2023 09:57

add buildstock_csv folder

3b1d264

add to the changelog_dev.rst file

236d609

Merge remote-tracking branch 'origin/develop' into 348-upload-buildst…

ea3b2e3

…ockcsv-to-s3-during-postprocessing

nmerket requested changes May 12, 2023

View reviewed changes

Merge branch 'develop' into 348-upload-buildstockcsv-to-s3-during-pos…

6e47a79

…tprocessing

afontani assigned yingli-NREL May 31, 2023

Yingli Lou and others added 3 commits July 20, 2023 15:28

improvement and test

c92259f

improvement and test

d280288

Merge branch 'develop' into 348-upload-buildstockcsv-to-s3-during-pos…

a63503f

…tprocessing

yingli-NREL requested a review from nmerket July 21, 2023 17:33

Yingli Lou and others added 9 commits July 21, 2023 14:03

bug fix

18e2b9d

bug fix

d095b9a

modity unit test

172a846

fix testing error

4a75b85

fix testing error

d17cb79

fix testing error

3e06ec8

fix testing error

7b1c38e

Merge remote-tracking branch 'origin/develop' into 348-upload-buildst…

e498558

…ockcsv-to-s3-during-postprocessing

Merge branch 'develop' into 348-upload-buildstockcsv-to-s3-during-pos…

5ddc99e

…tprocessing

nmerket added this to the v2023.10.0 milestone Oct 3, 2023

cleaning up changelog

b911d9c

nmerket reviewed Oct 6, 2023

View reviewed changes

nmerket added 2 commits October 9, 2023 17:15

using csv_path from the sampler to locate the builstock.csv file

aa4828f

Merge remote-tracking branch 'origin/develop' into 348-upload-buildst…

621e574

…ockcsv-to-s3-during-postprocessing

nmerket approved these changes Oct 10, 2023

View reviewed changes

nmerket merged commit 6e17414 into develop Oct 10, 2023
6 checks passed

nmerket deleted the 348-upload-buildstockcsv-to-s3-during-postprocessing branch October 10, 2023 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upload-buildstockcsv-to-s3 #365

upload-buildstockcsv-to-s3 #365

yingli-NREL commented Apr 19, 2023 •

edited by nmerket

github-actions bot commented Apr 19, 2023 •

edited

nmerket commented Apr 21, 2023

yingli-NREL commented May 3, 2023

nmerket left a comment

nmerket May 12, 2023

nmerket May 12, 2023

yingli-NREL May 16, 2023

nmerket May 12, 2023

nmerket May 12, 2023

nmerket commented Oct 6, 2023

nmerket left a comment

nmerket Oct 6, 2023

nmerket Oct 6, 2023

nmerket left a comment

		dask.compute(map(dask.delayed(upload_file), all_files))
		dask.compute(map(dask.delayed(upload_buildstock_csv), buildstock_csv))

-    dask.compute(map(dask.delayed(upload_file), all_files))
-    dask.compute(map(dask.delayed(upload_buildstock_csv), buildstock_csv))
+    tasks = list(map(dask.delayed(upload_file), all_files))
+    tasks.append(dask.delayed(upload_buildstock_csv)(buildstock_csv))
+    dask.compute(tasks)

upload-buildstockcsv-to-s3 #365

upload-buildstockcsv-to-s3 #365

Conversation

yingli-NREL commented Apr 19, 2023 • edited by nmerket

Checklist

github-actions bot commented Apr 19, 2023 • edited

nmerket commented Apr 21, 2023

yingli-NREL commented May 3, 2023

nmerket left a comment

Choose a reason for hiding this comment

nmerket May 12, 2023

Choose a reason for hiding this comment

nmerket May 12, 2023

Choose a reason for hiding this comment

yingli-NREL May 16, 2023

Choose a reason for hiding this comment

nmerket May 12, 2023

Choose a reason for hiding this comment

nmerket May 12, 2023

Choose a reason for hiding this comment

nmerket commented Oct 6, 2023

nmerket left a comment

Choose a reason for hiding this comment

nmerket Oct 6, 2023

Choose a reason for hiding this comment

nmerket Oct 6, 2023

Choose a reason for hiding this comment

nmerket left a comment

Choose a reason for hiding this comment

yingli-NREL commented Apr 19, 2023 •

edited by nmerket

github-actions bot commented Apr 19, 2023 •

edited