New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand historical coverage pre-2019 #295
Conversation
Picking this PR back up on 11/22/23 after months of inactivity. At this point, just merged the most recent development branch in and did a test run of the pipeline with a single year (2018) of data to see if it is working. I just wanted to sync this branch with the most recent changes before we started all of our other updates so it doesn't get too far out of sync, but will probably pick work back up on this after the 2022 data update is complete. |
dfb981d
to
d411aa3
Compare
This PR is now part of larger group of PRs that aim to update the data pipeline to allow for the creation of historical data from 2005-2018. All the PRs created to the expansion of the historical coverage will be merged into the This PR allow to run the pipeline without error from 2008 to 2018. Not that the warnings have not been investigated yet and the outputs have not been validated. This PR simply fixes errors encountered when running 2008 - 2018. Next steps:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments - some changes requested.
In addition to the next steps you listed above, it looks like we will need to figure out how to deal with the download_eia923() function since it will not work with some of the early data, and some of the functions that use those raw files may need alternative file handling in those earlier years. |
Also, can you please test this to make sure that the results for say 2022 match the existing outputs? This change should theoretically not affect any of the results |
e3316d4
to
91f8b4c
Compare
Indeed:
|
91f8b4c
to
4dc5bbc
Compare
I added one comment with a suggested name change, otherwise this looks good to merge once we confirm that this is not modifying the 2022 outputs. |
4dc5bbc
to
31ce596
Compare
…h cems and eia923
31ce596
to
938c2bd
Compare
@rouille Looks good to me - ready to merge! |
Summary
This PR updates the data pipeline to allow for the creation of historical data from 2013-2018. Because EIA-930 data is not available for a complete year prior to 2019, the data outputs prior to that year will be limited to the following:
Where to look
Most of the updates are in
data_pipeline.py
with minor changes in other files to update allowed year ranges, and update certain functions to accept an argument to specify different behavior based on whether hourly data is available or not.Update details
Document in more depth the changes being made
Screenshots
A couple screenshots of the changes/data if relevant.
Testing / Validation
After running the 2018 pipeline, I noticed the following warnings that are not tripped in the more recent data:
Linear ticket
Closes CAR-2968, CAR-1823, CAR-4206
Concerns
Anything you'd like to point out that the reviewers should pay special attention to
Next steps / Not addressed here
The availability of certain input data prior to 2013 may be different so that will be addressed in a future PR.
Checklist