Skip to content

Releases: singularity-energy/open-grid-emissions

v0.4.0

05 Apr 04:10
83bdce8
Compare
Choose a tag to compare

This minor release improves current validation checks, adds new validation checks, enforces static sub-plant id across years and allows users to access any Global Warming Potential value via the IPCC assessment report name where it is published.

Update sub-plant crosswalk table

As discovered in #351, the subplant_id assigned to each (plant_id_eia, generator_id) does not remain static across each year of OGE data. This is an issue if trying to use subplant_id as a primary key to compare data across multiple years.

This PR updates the process of creating sub-plant IDs to try to enforce static sub-plant IDs. The changes in this PR enforce static sub-plant IDs within a single data release version of OGE, although the sub-plant IDs may still change from version to version. (#353)

Validation Checks

  • For all warnings about plant-level data, adds information about the balancing area the flagged plant belongs to to help identify BAs where data quality is affected. (#348)
  • When checking The validation check detecting mismatch between input and allocated EIA-923 data is now done at the plant and energy source level (#350)
  • Functions for detecting anomalies in timeseries data have been added to the code base, and we now identify where gross generation, fuel consumption, and CO2 emission timeseries in the reported CEMS data may be anomalous based on a global extreme filter. (#349)

New feature

The function for calculating CO2-equivalent values now allows for the user to specify which IPCC Assessment Report to use for calculating GWP-adjusted CO2-equivalent values. (#352)

v0.3.3

27 Feb 17:32
7663993
Compare
Choose a tag to compare

This patch release addresses two issues that were preventing some users from being able to run the pipeline and use the OGE package:

  • Updates the instructions for using conda to manage the oge code environment and updates the environment.yml file that specifies the conda environment. This had fallen out of date with the pipfile environment files in recent releases. (#345)
  • Fixes an issue where the use of back slashes instead of foward slashes in oge.filepaths was causing errors when attempting to load OGE files from the s3 bucket. (#346)

This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

v0.3.2

15 Feb 00:05
f265b3d
Compare
Choose a tag to compare

This patch release of OGE fixes an issue where the python version specified in pyproject.toml was incompatible with the version of python used in the rest of the package, preventing OGE from being installed in other projects (#344)

This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

v0.3.1

10 Feb 17:43
d21db26
Compare
Choose a tag to compare

This patch release of OGE makes several updates to OGE's code infrastructure, dependencies, documentation, and file downloads, but does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

Accessing OGE outputs and results through the cloud (#338)

  • In v0.3.0 we packaged OGE, allowing other projects to import OGE code directly. However, in order to load and use any of the downloads, outputs, or results files, it would still be necessary to run the data pipeline locally to make those files available.
  • This release allows these files to now be read directly from an AWS s3 bucket, eliminating the need for the pipeline to be run locally when importing OGE into another project.
  • Instructions for how to set the s3 bucket as the default data store are now included in the readme
  • We also fixed a bug where a log file was being created whenever an OGE function was called from another project. Now, a log file should only be created whenever the main data pipeline is run (#340)

Updates eGRID downloads to include eGRID2022 (#337)

  • Although eGRID is not used as an input to the OGE data pipeline, these files are downloaded and included in the data store, as the eGRID data can be loaded and explored via several functions in OGE.
  • This release includes the newly-published eGRID2022 file in the set of downloaded files
  • This release also standardizes the downloaded eGRID file names to use consistent capitalization across years.

More transparent conversion factors and constants (#339)

  • In past versions of OGE, some of the standard conversion factors and assumed values were spread across multiple files.
  • This release moves all of these factors and assumed values (if not already included in any of the reference_tables) to a centralized location in constants.py so that they can be easily reviewed.
  • Moving these factors also helped avoid the potential for circular imports between the modules.

Miscellaneous

  • Updates several package dependencies in the pipfile to address security updates (#341)
  • Updates small errors in README file

v0.3.0

29 Dec 19:56
939dd3b
Compare
Choose a tag to compare

Updates PUDL dependency (#318 )

  • Updates pudl dependency from v2022.11.30 to v2023.12.01, which includes a number of updates to the database structure and naming conventions (see pudl release notes)
  • Changes source of PUDL database download to AWS rather than Zenodo, providing faster access to PUDL data releases
  • PUDL’s CEMS database now includes data from AK, HI, and PR, which should improve hourly emissions data coverage for plants in AK and HI
  • A cleaned and standardized version of the EPA-EIA power sector data crosswalk is now included in the pudl database, meaning we no longer have to manually load and standardize this data
  • Emissions control equipment data from EIA-860 is now included in the pudl database, meaning we no longer need to manually load and standardize this data
  • Leading zeros removed from boiler_ids, which should improve mapping between boiler tables
  • The EIA-923 generation and fuel allocation process is now fully integrated into PUDL
  • Fixes an issue where certain plants in NY state were being assigned the wrong BA code.

Adds 2022 data (#322)

  • Integrates Final release input data from the EIA and EPA for 2022
  • Adds 2022 OGE outputs

Manual reference table update (#322)

  • Most reference tables did not require updating
  • NOX and SO2 emissions factors: added new factors for boiler configurations that had not previously been included in the table.
  • Balancing Areas: Added retirement dates for the CFE (July 2018), GLHB (September 2022), GRIF (November 2023) balancing areas
  • Added new EPA-EIA plant and unit crosswalks based on 2022 data
  • Added several new mappings between utilities and balancing areas

Infrastructure Updates

  • Updates Python dependency from 3.10 to 3.11
  • Refactors and packages OGE codebase so that functions, reference tables, and data from OGE can be imported into other projects. This package will go live on PyPi soon. (#323)
  • Re-organizes location of data files. The data/manual files have been renamed to reference_tables and moved to src/oge, while all downloads, output files, and result files will now be saved in the user’s home directory in a folder called open_grid_emissions_data (#324)
  • Adds support for pipenv environment management in addition to conda (#313)
  • Changes PUDL and gridemissions dependencies to forks within the singularity-energy organization, rather than forked versions that lived in individual authors’ github accounts.
  • Moves documentation from separately-maintained repo into the OGE repo (#303)
  • Changes code formatting from black to ruff and adds formatting checks that must pass before merging code (#317)

Other bug/data quality fixes

  • Ensure complete as possible EPA-EIA power sector data crosswalk by combining pudl-standardized PSDC, plant code mappings from eGRID, and our own manual crosswalking.
  • Add handling for negative fuel consumption reported in EIA-923
  • Stop dropping missing and zero values to help ensure complete timeseries
  • Previously, we had dropped data from CEMS that reflected units that only reported steam generation but no electricity generation. Based on an updated understanding of this data, we no longer drop this data from OGE.
  • Fixes bug in EIA-923 generation and fuel allocation process that was resulting in certain reported fuel consumption data being dropped for plants that retire mid-year
  • Updates manual timestamp corrections to EIA-930 data for 2022 and on CAISO data (#300), 2021 and on TEPC data (#322)

Adds new data validation checks

  • Flags when different plant primary fuel identification methods result in different primary fuel assignments: Exports the primary_fuel_table with all intermediate columns to outputs to help with validation. Adds a new validation check to flag when the plant primary fuel assigned by the pipeline does not match the capacity-based primary fuel assignment. (#296)
  • Flags when subplants only contain a single combined cycle component: Combined cycle generators contain a steam part (CA) and turbine part (CT) that are linked together. Thus, our subplant groups that contain one part of a combined cycle plant should always in theory contain the other part as well. This PR adds a test that checks that both parts exist in a subplant if one exists. Besides CT and CA prime movers, there is also CS prime movers which represent a "single shaft" combined cycle unit where the steam and turbine parts share a single generator. These prime movers are allowed to be by themselves in a subplant, as are CC prime movers, which represent a "total unit." This PR adds a prime_mover_code column to the subplant crosswalk table to help validating this.(#297)
  • Checks for complete monthly data within a single year: Checks that 12 monthly “report_date”s exist for each plant/subplant, and also checks that the number of missing monthly datapoints matches the number of missing datapoints in the input data from CEMS and EIA-923.
  • Checks for complete hourly timestamps within a single year or single month: If the period is a 'year', checks that the length of the timeseries is 8760 (for a non-leap year) or 8784 (for a leap year). If the period is a 'month', checks that the length of the timeseries is equal to the length of the complete date_range between the earliest and latest timestamp in a month.(#299)
  • Exports a new output table that identifies whether input data (and non-zero input data) exists for each plant in EIA-923 and/or CEMS.

v0.2.2

02 Mar 17:32
bb0c032
Compare
Choose a tag to compare

This release primarily fixes a bug that affected the quality of the CO2 emissions data for multiple regions in the Southeastern U.S., namely AEC, SOCO, and TVA. This bug resulted in substantial (>1%) errors in the emission totals and rates for these regions. This bug also affected the CO2 data for a handful of individual plants in MISO, PJM, ERCO, CPLE, SWPP, DUK, and NYIS.

This release includes multiple improvements:

  • Fixes a bug that was assigning CO2 data from CEMS to the wrong rows when attempting to fill missing CO2 data (#280)
  • Updates the handling of command-line arguments when running the pipeline (#288)
  • Whenever net generation in a period is zero, the calculated generated emission rate was previously missing due to dividing by zero. In this release, we now fill apply a zero emission rate to these periods. For all other periods where emissions or generation data is actually missing, the generated emission rate will still be missing (#290)

Validation improvements:

  • Raises a warning in allocated net generation or fuel consumption outputted from the EIA-923 generator allocation process is more than 0.1% different than the input data (#278)
  • Adds logging to the data pipline, instead of using print statements. This also fixes a bug that was preventing logging messages from pudl from showing when running the pipeline. This allows us to save an output of all warnings to help validate the results. (#285)
  • Expands the coverage of multiple existing validation checks to make them more comprehensive (#287)

v0.2.1

02 Feb 20:26
1477ac5
Compare
Choose a tag to compare

This release primarily addresses an issue identified in #271, in which our data pipeline was dropping a substantial amount of data due to mismatches in the reported energy source codes used in EIA-860 and EIA-923, and our failure to validate the outputs of the allocation process more carefully. While fixing this issue, we also came across several other issues that were causing anomalous emission factor outputs.

Summary of changes

  • Previously, some generation and fuel data reported in EIA Form 923 was being dropped from OGE due to inconsistent energy source codes being used for certain plants between the EIA-923 input data and EIA-860 input data. This was resulting in incorrect emissions and generation totals for certain plants, as well as incorrect primary fuel categories being assigned to these plants.
  • This release also includes several updates to the method for identifying the primary fuel type of each plant, which fixes a bug that was causing certain nuclear plants to be identified as a non-nuclear fuel type due to missing fuel consumption data in EIA-923.
  • This release also includes updates to our methodology for converting gross generation data reported in CEMS to net generation. These updates include more stringent standards for which conversion factors are used for each plant, and more robust backstop conversion factors. This update will result in more net generation being reported for certain plants, and more realistic plant-level emission intensity values.
  • We have also added the newly-released eGRID2021 dataset to the list of downloaded files so that 2021 OGE values can be easily compared to 2021 eGRID values using our validation notebooks included in the repository.

Detailed changes

Fixes the generation and fuel allocation process

  • Adds a new function add_missing_energy_source_codes_to_gens() that adds energy_source_codes that appear in the gf table but not gens to gens.
  • In some cases, non-zero fuel consumption and net generation is reported in the EIA-923 generation and fuel table that is associated with an energy_source_code that is not associated with that plant-prime mover in the gens table, which would cause these data to get dropped when these two tables are merged. To fix this, for each plant-pm, this function identifies such esc, and adds them to the gens_at_freq table as new energy_source_code columns.
  • The sub-function identify_missing_gf_escs_in_gens() identifies when there are fuels reported in the gf table for that plant-pm that are not listed in the gens table for that plant pm.
  • Adds the MISSING_SENTINEL value to the net_generation_mwh_g_tbl column. For some reason, this column had been commented out, which was leading to NaNs appearing in the data when dividing by this column when the value was zero. I un-commented this line.
  • In allocate_net_gen_by_gen_esc(), we no longer allow frac_from_g_tbl to be greater than 100%. This was previously happening when the mwh reported in the g table were greater than the mwh reported in the gf table. However, numbers greater than 100% was causing the frac_missing_from_g_tbl to become negative, which was resulting in nonsensical allocations. We implement the same cap on frac_from_bf_tbl in allocate_fuel_by_gen_esc().
  • In allocate_fuel_by_gen_esc(), when calculating frac_cap, the code had been dividing capacity_mw by capacity_mw_unit_fuel. However, this was resulting in some nonsensical allocations because fuel is being allocated by PM-fuel, not by unit. We changed this to divide by capacity_mw_pm_fuel instead. This is consistent with how frac_cap is calculated in the allocate_net_gen_by_gen_esc() function
  • Rename adjust_energy_source_codes() to adjust_msw_energy_source_codes() to more precisely describe what the function does
  • Adds new entries to the manual emissions factor tables for NOx and SO2 to represent fuel-boiler combinations that had previously been getting dropped from the data due to this bug.
  • Changes the pudl version that we use in our environment from catalyst-cooperative/pudl@main to grgmiller/pudl@oge_release. This will give us more control over performing fixes like this in the future.
  • Adds a new validation check to the EIA-923 data cleaning process to verify that for each plant, the total allocated fuel and generation matches the total fuel and generation reported in the input generation and fuel table (basically that the allocation process is not dropping or inflating the data).

Updates plant primary fuel identification

  • When assigning the plant primary fuel based on the most consumed fuel, we were previously assigning this based on the fuel with the highest fuel_consumed_mmbtu. However, we should be using fuel_consumed_for_electricity_mmbtu since we want to assign the primary fuel used for electricity generation.
  • Sometimes nuclear generators report 0 fuel consumption in EIA-923. Since we were assigning a plant's primary fuel first based on fuel consumption, this meant that sometimes if a nuclear plant had a backup fossil generator, the plant was being assigned the fuel code of that backup generator. To fix this, we now assign the primary fuel of any plant that contains a nuclear unit based on the nameplate capacity of the unit.

Updates to gross to net generation conversions

  • Previously, when converting CEMS gross generation to net generation, we had filtered out any ratios that were greater than 1.5 or less than 0.2. However, these values were somewhat arbitrary and turning out to be too wide of a range. For example, when there was a large discrepancy between CEMS gross generation and EIA-923 net generation, we were scaling the CEMS generation to match, even though the fuel consumption and emissions reported in CEMS also disagreed and were not being scaled. This was leading to instances where a plant was using CEMS CO2 totals but EIA-923 net generation totals, resulting in the plant having abnormally high emission rates. To be consistent, if we are going to use CEMS data at all, we want to make sure that the net generation values are reasonable given the reported net generation. After analyzing three years (2019-2021) of annual gross to net ratios, both at the plant and subplant levels, it appears that generally the interquartile range of GTN ratios is between 0.75 and 1.00, with an upper bound around 1.25. Thus, we are now using 0.75 as the lower bound for filtering out ratios, and 1.25 as the upper bound.
  • Previously, the backstop gross to net generation approach if all other conversion factors were not available was to assume that gross generation equaled net generation (i.e. a GTN ratio of 1). However, as identified in #177, the EIA has default gross to net conversion factors for each prime mover that they use. This PR introduces these PM-specific gross to net conversion factors as the default backstop option now. As noted in the issue, there are still improvements that need to be made before #177 can be closed, but this is a step in the right direction.

Flags potentially anomalous generated emission factors

  • When outputting annual plant level data, we add a new validation check that calculates a generated co2 rate, and flags any rates that appear to be anomalous, so that we can manually inspect these plants to see if there are any unexpected results. For this test, we define anomalous values two ways. On the high end, the check flags the plant if the co2 rate is higher than 15,000 lb/MWh. On the low end, the check flags any plants that have rate lower than 10 lb/MWh but higher than 0 MWh.

Add eGRID2021 to the list of downloads for validation

  • This release adds eGRID 2021 data to our list of downloads, and updates the list of non-grid connected plants based on additions to the list in eGRID 2021.

Fixes a bug that was leading to incorrect balancing authority assignments

  • Fixes an issue where a plant with no reported ba_code was getting filled with the incorrect code based on the ba_name.
  • We have also identified a new known issue that is not fixed in this release: In comparing our 2021 data to the eGRID 2021 data, we found that there are some plants that EIA-860 identifies as being in ISNE that are getting assigned to NYIS by pudl, and thus are categorized in a different BA than they are in eGRID. All of these plants seem to be physically located in the state of New York, but are listed with an ISNE BA code. Also, all of these plants are pretty small. See: catalyst-cooperative/pudl#2255. We will work to address this with the pudl team.

v0.2.0

30 Dec 22:51
5e24c69
Compare
Choose a tag to compare

Release Notes

2021 Data Release

  • This release includes new data for 2021, and updates the year 2019 and 2020 data.

Hourly data for all individual plants

  • The plant data results now include hourly data for every individual plant in the U.S., including those plants for which we impute the hourly generation profile. Previously, we had aggregated the imputed data to the fleet level. Details #246

Updates OGE pipeline to work with PUDL v2022.11.30

  • Updates the pipeline to work with v2022.11.30 of PUDL, which introduced many breaking changes. See #258
  • The new PUDL release now performs many of the CEMS data cleaning steps that we previously performed in the OGE pipeline, so these data cleaning steps have been removed

Fixes a bug that was overestimating NOx and SO2 emissions for some plants

  • Some NOx and SO2 control data is missing control ID numbers in EIA-923, which was causing this data to get dropped, which meant that OGE was treating emissions from these generators as uncontrolled. This update fixes that issue. See #255

Patches bugs with consumed emissions calculation

  • Updates environment to fix bug that was leading to random missing values in consumed calculation on some operating systems.
  • When calculating consumed emissions, we calculate demand in a BA by subtracting interchange from generation, but for certain BAs, this approach results in negative emission factors being calculated, we we directly use reported demand from EIA-930. We updated the list of BAs for which we apply this approach, which is now differentiated by year so that we are only performing this patch where strictly necessary.
  • Makes the approach for identifying imputed values in the EIA-930 data cleaned by gridemissions consistent across the hourly shaping step and the consumed emissions step of the data pipeline.
  • Updates the manual cleaning of EIA-930 data to remove OVEC corrections, and add a timestamp offset for SC prior to 12-31-2020

Validation

  • Add a validation check for missing values in all results files

Output files

  • Export a cleaned version of the unit-level CEMS dataset to outputs (outputs/cems_cleaned.csv). Previously we only exported a version after aggregation to subplants and gross-to-net-generation conversion. This original file was renamed outputs/cems_subplant.csv.
  • Add an option to export metric files or not when outputting data

Balancing authority updates

  • Anchorage Municipal Light and Power retired on October 30, 2020
  • Electric Energy, Inc (EEI) changed to a generation-only BA
  • Map Pacific Gas and Electric utility to CISO

Emissions Calculation Updates

  • When imputing missing emissions data in CEMS, we now calculated fuel-weighted emission factors for each subplant-month which are used for imputing missing emissions values. This is based on the total consumption of each type of fuel that is reported to be consumed in each subplant-month in EIA-923. The process for imputing missing emissions is now:
  1. If a unit has non-missing emissions data for other hours in the same month, calculate a unit-month specific EF from the CEMS-reported fuel consumption and emissions
  2. For all remaining missing values, use the subplant and month-specific weighted average emission factor from subplant_emission_factors calculated from the EIA-923 data
  3. For any remaining missing values, calculate emissions based on the subplant primary fuel and fuel consumption
  4. For any remaining missing values, calculate emissions based on the fuel type assigned in the power sector data crosswalk.
  • Previously, when assigning a fuel type to each emissions_unit_id_epa, we prioritized using the fuel type reported in the power sector data crosswalk. We now identify the primary fuel type for each subplant using EIA-923 fuel consumption data, and use this to assign a fuel type to each CEMS unit. The fuel type reported in the power sector data crosswalk is now used as a last case.
  • Add a NOx emission factor for CS prime movers with OG fuel consumption

Subplant identification

  • Manually assign all units at plant 1391 to a single subplant

EPA-EIA Crosswalk

  • Update to use v0.3 of the power sector data crosswalk
  • The crosswalk is now integrated as a table into PUDL, so we will use the pudl cleaned version in the pipeline
  • Add manual unit to generator crosswalks for plants 60925, 60910, 63259
  • Identified plant 59073 (Cove Point LNG Terminal) as a non-grid connected plant

Notebooks

  • Add notebook to explore the reported fuel heat content for each fuel (notebooks/manual_data/export_fuel_heat_content.ipynb)
  • Update notebook used to identify uncorrected time lags in raw EIA-930 data

Known issues

  • The gross to net conversion for two plants (plant 55799 in 2019, and plant 57865 in 2021) is likely incorrect due to data inconsistencies between reported gross generation in CEMS and reported net generation in EIA-923 for these plants.

What's Changed

Full Changelog: v0.1.2...v0.2.0

v0.1.2

26 Oct 23:00
bce51f7
Compare
Choose a tag to compare

Release Notes

Fixes issue with assignment of subplant IDs
Fixes an issue that was causing some generators/units to be assigned missing or incorrect subplant IDs. This issue caused several downstream issues including inaccurate hourly shapes being assigned to certain subplants, or inaccurate conversion of gross to net generation in the CEMS data. This patch ensures that every generator_id and unitid is associated with a non-missing subplant ID, and that these subplant assignments account for boiler-generator associations from EIA-860 (full details).

  • Impacted data: Plant-level data, power sector data, and carbon accounting data

Fixes anomalous spikes in emission rate data
Several grid regions were exhibiting anomalous dips in their regional emission intensity values due to an issue with the methodology used to shape data from plants with partial CEMS data. Specifically, the generation from certain non-emitting plants (e.g. nuclear, solar, etc) that had a fossil-fuel backup generator onsite were being assigned the intermittent hourly profile of the backup generator if that generator reported to CEMS. This resulted in data quality issues in both the generated and consumed emission rates for some regions. This patch fixes that issue by excluding all non-emitting generators and plants with subplants of mixed fuel types from using the partial CEMS methodology. (full details).

  • Impacted data: carbon accounting data, power sector data, certain plant data, with the largest impacts in PJM and plant EIA ID = 2410 (PSEG Salem Generation Station)

Other updates

  • Fixes an issue that was resulting in an infeasible conda environment by updating to a stable branch of the Public Utility Data Liberation Project.
  • Improves the speed of running the part of the data pipeline that identifies subplant IDs.
  • Updates the plant_metadata.csv file to help users more easily identify the methodologies used for each plant.
  • Adds adjusted R-squared values to the gross to net generation regression outputs available in the data/outputs/gross_to_net_conversions.csv file.
  • Renames the data/outputs/subplant_crosswalk.csv file to subplant_crosswalk_[YEAR].csv to clarify that subplant IDs are only valid for a specific year.

What's Changed

Full Changelog: v0.1.1...v0.1.2

v0.1.1

08 Sep 15:31
901b8ee
Compare
Choose a tag to compare

Summary of updates

This release (v0.1.1) patches several bugs that were affecting the quality of the data published in the initial release. A big thank you to user @ewezerek for bringing one of these bugs to our attention.

Consumed emission factors
Due to a mathematical error in the previous version of the code, the consumed emissions factors incorrectly accounted for the flow of electricity between regions, which was causing some consumed emissions factors to be reported as negative. We have fixed this error and also implemented validation checks to ensure that the published emissions factors are not negative. While fixing this bug, we also noticed a data quality issue with the EIA-930 input data for AZPS interchange with SRP prior to 6/1/2020, which has been manually corrected.
Impacted data: all carbon accounting data, certain regional power sector data

Sulfur Dioxide (SO2) emissions
Due to misunderstanding several of the assumptions for applying SO2 emissions factors, the SO2 emissions data derived from EIA-923 fuel consumption data were under-reported by several orders of magnitude. These calculations have been corrected, and the updated SO2 emissions totals have been validated as consistent with the SO2 emissions totals published in other data sources.
Impacted data: All SO2 emissions data for plants that don’t report SO2 emissions to CEMS (impacts all regional power sector and emissions accounting data)

Data Quality Metrics
This release adds NOx and SO2 data to the published data quality metrics, and adds information about the quality of the published CEMS data. Details on these updates can be found here.

What's Changed

Full Changelog: v0.1.0...v0.1.1