New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory efficient results.csv creation #258
Conversation
Works in a small batch. Need to verify for a super large batch that previously would have failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from what I can see. I think the dask dataframe is a good choice here. Let me know how it goes with a larger dataset.
Works in a large run with 350K buildings, 16 upgrades. With n_worker=10 for postprocessing took ~10 hours. |
That's a long time, but it's good that it worked. I suppose that's what matters. |
Potentially fixes #253.
Pull Request Description
Instead of loading all the results_jobx.json.gz files at once to memory, use dask to load only the results_jobs for an upgrade at a time. This should prevent out of memory error when dealing with large run with many upgrades.
Checklist
Not all may apply