Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure complete timeseries data #326

Open
grgmiller opened this issue Dec 28, 2023 · 0 comments
Open

Ensure complete timeseries data #326

grgmiller opened this issue Dec 28, 2023 · 0 comments
Labels
bug Something isn't working emissions Accuracy/completeness of emission mass data generation data accuracy/completeness of generation data hourly profiles Accuracy of hourly profile imputation

Comments

@grgmiller
Copy link
Collaborator

We want to ensure that there are always 12 values for all monthly data, and 8760 (or 8784 in leap years) values for all hourly data. Our validation checks are currently identifying instances of less than this number of values, as well as instances where there are more than 8760 hourly values.

Here are some examples of various situations that are being flagged and some ideas on how to fix them:

Timeseries has 8758-8759 out of 8760 values.

  • Example: plant 540 in 2022
  • In ISONE, multiple petroleum plants are missing data on 4/1 at 4am UTC. These plants switch from using the 930 hourly profile in March to using CEMS data in April. Apparently, all CEMS data is reported in standard time, so for plants that start reporting mid year (eg April 1 at midnight), that data actually represents April 1 at 1am local prevailing time, so the first timestamp we have available is 5am UTC.
  • Potential fixes: 1) treat this value as missing, but ensure that we have a timestamp in the timeseries for this timestamp. This missing timestamp is not currently being picked up by our validation check since we are only checking for complete timeseries between the min and max timestamp in each month, but because the first timestamp is missing, we are not seeing it. (we should fix this validation check to use a complete date range rather than the min/max dates) 2) try and interpolate this single hour value.

Timeseries has much less than 8760 values

  • Example: plant 10549 in 2022
  • This is likely due to only having partial months of reported data in CEMS and or EIA-923.
  • Solution: When loading 923 and CEMS data, we should ensure complete timestamps/report dates for each plant/subplant/unit, filling in missing values where no data is reported. However, this may balloon the size of the dataframes so we want to be careful with this.

Timeseries has more than 8760 values

  • Example: plant 50240
  • plant 50240 is located in ET, where MISO spans from MT to CT to ET. 50240 has data starting 5am UTC (expected for EST), but ending at 5am UTC (it should end 4am). It looks like the 930 hourly profile we are using to shape this plant (MISO NG) is in central time, even though this plant is located in eastern time. Thus, when shaping this plant, it is adding an extra hour to the end (and may be resulting in missing data for the first hour of that month if switching from CEMS to EIA).
  • Solution: We may need to use tz-aware profiles when shaping and assigning report dates for BAs with plants in multiple TZs. We may also want to add a validation check to flag when a BA has plants in multiple timezones.
@grgmiller grgmiller added bug Something isn't working hourly profiles Accuracy of hourly profile imputation emissions Accuracy/completeness of emission mass data generation data accuracy/completeness of generation data labels Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working emissions Accuracy/completeness of emission mass data generation data accuracy/completeness of generation data hourly profiles Accuracy of hourly profile imputation
Projects
None yet
Development

No branches or pull requests

1 participant