Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate lack of monthly year-to-date data in out_eia923__monthly_generation_fuel_by_generator table #3634

Open
zaneselvans opened this issue May 14, 2024 · 0 comments
Labels
data-validation Issues related to checking whether data meets our quality expectations. eia923 Anything having to do with EIA Form 923 time what even is time. fixing and changing the way in which PUDL data deals with time

Comments

@zaneselvans
Copy link
Member

zaneselvans commented May 14, 2024

In #3625 it seemed odd that there was no 2023 data showing up in the out_eia923__monthly_generation_fuel_by_generator table, even with 11 months of 2023 incremental_ytd records from the EIA-923:

gen_eia923_ms = pd.read_sql("out_eia923__monthly_generation", pudl_engine)
gen_eia923_ys = pd.read_sql("out_eia923__yearly_generation", pudl_engine)
gf_by_gen_eia923_ms = pd.read_sql("out_eia923__monthly_generation_fuel_by_generator", pudl_engine)
gf_by_gen_eia923_ys = pd.read_sql("out_eia923__yearly_generation_fuel_by_generator", pudl_engine)
frc_eia923 = pd.read_sql("out_eia923__monthly_fuel_receipts_costs", pudl_engine)

print(f"gen MS: {gen_eia923_ms.report_date.max()}")
print(f"gen YS: {gen_eia923_ys.report_date.max()}")
print(f"gen fuel by gen MS: {gf_by_gen_eia923_ms.report_date.max()}")
print(f"gen fuel by gen YS: {gf_by_gen_eia923_ys.report_date.max()}")
print(f"frc MS: {frc_eia923.report_date.max()}")

# gen MS: 2024-12-01 00:00:00
# gen YS: 2023-01-01 00:00:00
# gen fuel by gen MS: 2022-12-01 00:00:00
# gen fuel by gen YS: 2022-01-01 00:00:00
# frc MS: 2024-02-01 00:00:00

This seems a little bit fishy. We use pudl.output.eia923.drop_ytd_for_annual_tables() to avoid "annual" aggregations of data where we don't have a whole year of data, but here it seems like we're also somehow excluding monthly year to date records, which I don't think is intentional? And drop_ytd_for_annual_tables() does not get called when freq=="MS"

Investigate why this truncation is happening, and evaluate whether that's the expected / desired behavior.

Possible explanation

The out_eia923__monthly_generation_fuel_by_generator table depends on the fuel & generation allocation process, which depends on the boiler generator association table, and that table is only available from the annual EIA-860, not the monthly EIA-860M data, so the fact that we don't have the allocated generation & fuel table for periods in which there's only EIA-860M data right now makes sense.

If we wanted to hack it to give us some estimate of the most recent allocated data we could just forward fill the BGA table up to the most recent year, and it would be mostly right since these associations don't really change unless there's a major overhaul to a plant, but we're not doing that now.

@zaneselvans zaneselvans added eia923 Anything having to do with EIA Form 923 time what even is time. fixing and changing the way in which PUDL data deals with time data-validation Issues related to checking whether data meets our quality expectations. bug Things that are just plain broken. and removed bug Things that are just plain broken. labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-validation Issues related to checking whether data meets our quality expectations. eia923 Anything having to do with EIA Form 923 time what even is time. fixing and changing the way in which PUDL data deals with time
Projects
Status: New
Development

No branches or pull requests

1 participant