Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory_issue_fix #212
memory_issue_fix #212
Changes from 6 commits
f1b5fa8
634773d
f8e9558
6b0665f
749346b
72f50e8
befbcb5
c7e0546
564e089
ec365a6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty broad
except
block. It looks like it just returns an empty dataframe with the appropriate columns if anything goes wrong. Intentionally creating these kinds of silent errors is troubling because it means that there could be major problems with reading some of the parquet files and the user will not know and think everything is fine.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed it could mask errors. But I think a few times there was an issue reading a handful of parquet files that resulted in crashing everything, so I thought it might be a good idea to assemble whatever can be assembled. Do you think it would be okay to have this try-except if we only wrap the read_parquet call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that's why you had this here. I don't know if it functionally matters much to only wrap the read_parquet call rather than do this. My primary concern is that this will silently omit data the user is expecting. That could mess up results. Right now at least the user knows something went wrong because they don't get their results. Could we put some sort of error message in the postprocessing (or other relevant) log file saying, "building_id = x, upgrade = y failed to load parquet"? They still might miss that, but it would be documented somewhere. It would be better than what we have now, which is an obtuse error with very little to go on to find the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmerket I started working on the solution you proposed of logging the buildings we failed to read the timeseries, but then realized we would have to do it at multiple places. If there is any corrupted parquet it would fail here first:
buildstockbatch/buildstockbatch/postprocessing.py
Line 281 in 1cd0349
And, the corrupted parquet (or not being able to read from disk) is something I have seen only a couple of times when the eagle was having an issue, so it may not be worthwhile to account for that. Even if we need to handle that, maybe not in this PR. So, I have removed all such handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, but usually a variable written in
ALL_CAPS
is understood to be a constant in python.CamelCase
is a class. This looks to be a regular variable which are usuallylower_snake_case
. It's not clear from the variable name what this is. Maybe a more descriptive name would be helpful.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
point taken :). Thanks. I have a propensity to use big N for counts of any sort. I will change the name.