-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate GridPath RA Toolkit hourly renewable generation profiles #3467
Comments
Additional Questions for @elainekhart & @anamilevaHow are the BA-level renewable generation curves derived from the project (plant/generator) level curves? Are they just the capacity-weighted sums of the project-level capacity factors for all projects associated with a given BA?
In aggregating the project-level wind and solar data into BA level data, how do you deal with changes in the associations between plants and BAs? These could come from changes in the BA boundaries over time, or maybe for other reasons. Is it the case that the same projects can end up in different BAs depending on what year of data you're looking at?
If it's not a simple transformation from the project-level curves to the BA level curves, then for now we should probably just use the BA level curves. Which of those curves would we need? The
Do the BA codes associated with these production curves correspond to the reported BA codes associated with the individual plants/generators which we would find in EIA860, or do they refer to the simplified / aggregated BAs that you created to deduplicate some data and consolidate many tiny BAs into a smaller number of big BAs?
Is there an explicit mapping stored somewhere that defines these aggregations by BA code or EIA IDs?
What are the
|
Tasks
gridpathratk
archive on Zenodo to make ETL simpler. #3523Design Considerations
Wind Profiles
HourlyWind_byProject.zip
are all of the form57282_capfactor.csv
.plant_id_eia
?Solar Profiles
HourlySolar_byProject.zip
have names like10437_SUN2.csv
where the leading integer is the EIA facility ID, and the part after the underscore is the generator ID. They contain hourly capacity factors, with UTC timestamps that are always at 30min after the hour.(plant_id_eia, generator_id, timestamp_utc)
with a single data column ofcapacity_factor
Overall
plant_id_eia
,generator_id
,timestamp_utc
) + a singlecapacity_factor
data column. Given that there will be 300M+ rows, adding extra columns seems unwise, but a set of generator IDs could be selected on the basis of various attributes from the EIA860 data, and then used to query the wind or solar time series data. Would that be convenient? If so, is there a good reason not to store all of the wind and solar hourly generation profiles in the same Parquet file?Questions
Notes from README
Appendices refer to the GridPath RA Toolkit report
Hourly Wind Profiles
HourlyWind_byProject.zip
: contains hourly simulated wind capacity factor data by project between 2007 and 2014, based on wind speed data from NREL's Wind Toolkit and empirically-derived power curves. Each file corresponds to a project from EIA Form 860:[Plant ID]_capfactor.csv
. Note that the hour ending or "HE" time stamp column is missing, but the 24 hours of data corresponding to each day represents HE 1 through HE 24 of that day in Pacific Standard Time. For more information about how this data was developed and used in the study, see Appendix A.4.Hourly Solar Profiles
HourlySolar_byProject.zip
: contains hourly simulated solar capacity factor data by project between 1998 and 2019, based on data from the NSRDB and NREL's SAM model. Each file corresponds to a project from EIA Form 860:[Plant ID]_[Generator ID].csv
. Timestamps are in UTC. For more information about how this data was developed and used in the study, see Appendix A.5.Weather Data
DailyWeatherData_cleaned.csv
: daily weather data from 16 locations in the West between 1948 and 2021. For more information, see Appendix E of the report.Hydro Data
MonthlyHydro_byPlant.csv
: monthly hydro energy by plant from EIA Form 923/906 between 2001 and 2020, listed by EIA Plant ID and EIA Plant Name. For more information about how this data was used in the study, see Appendix A.3.Hourly Load Profiles
HourlyLoad_FERC714_cleaned.zip
: contains hourly load data between 2006 and 2020 from FERC Form 714, which was used to develop the load shapes in the Western RA Case Study. Each file corresponds to a FERC respondent. In each file, the columns are: year, month, day, hour ending (Pacific Standard Time), load (MW). This data has been cleaned for use in this study, including making manual adjustments for missing or bad data. For more information about how this data was used in the study, see Appendix A.1.Thermal Generators
HourlyThermal_byGenerator.zip
: contains hourly estimated thermal temperature derates by generator between 1998 and 2019, based on temperature data from the NSRDB and project-specific piece-wise linear derate functions. Each file corresponds to a project from EIA Form 860:[Plant ID]_[GeneratorID].csv
. Timestamps are contained intimestamps.csv
and are listed in hour ending, Pacific Standard Time. For more information about how this data was developed and used in the study, see Appendix A.2.Three Levels
There are 3 different versions of the wind and solar generation profiles available in the archived data
Eventually I think we would like to be able to run this aggregation and data repair process within PUDL so that it could be adapted to different purposes. However, at the moment for the MVP we just need the final output. We can backfill the other steps later with better understanding.
One complication is that there are a small number of wind & solar projects which are "hybrid" -- they include energy storage as well as renewable generation. They have their own separate production curves, but may not be straightforwardly combinable with the pure renewable generation. Need to ask @anamileva & @elainekhart how to treat this data in relation to the other profiles.
The text was updated successfully, but these errors were encountered: