You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In trying to integrate NREL ATB, I ran into an oddity that made it difficult to integrate the data from 2019 and 2020. Because of this we are not integrating these years of ATB. All of this exploration was based on this semi-cleaned asset: _core_nrelatb__transform_start.pkl.zip
There is a column called core_metric_case which is always either "Market" or "R&D". Then there is another column called core_metric_key which is a composite (semi-)primary key column containing codes representing info stored in other columns in the data. The first character of the core_metric_key is always either Ror M. We've called this first letter of the core_metric_key the mystery_code. We and other collaborators thought this corresponded to the core_metric_case. It does the ~75% of the time:
I'll note that the core_metric_key seems to have changed structure over time - especially in 2023. Also, the mystery_code never deviated from the core_metric_case in 2023.
This would be fine if the values in the value column did not vary based on the mystery_code (we could drop fully duplicate records w/o this mystery_code or the core_metric_key). But the data does seem to value truly different values. Of the three data tables which are derived from the info in the value column, two tables have real variability in value based on the mystery_code. The records that are variable by mystery_code make up 12% of the core_nrelatb__yearly_projections_by_scenario table and 5% of the core_nrelatb__yearly_rates_projections table.
The text was updated successfully, but these errors were encountered:
just to note here, when i was trying to integrate 2019 and 2020 there is data in these two columns that indicate info about the revision of the data. those columns were completely null for the more recent years, so I took this out after removing 2019 and 2020 from the integration. but if we ever go back and tackle this weird thing and integrate 2019 and 2020 we could add back in this very small normalized table
@assetdefcore_nrelatb__yearly_revisions(
_core_nrelatb__transform_start: pd.DataFrame,
) ->pd.DataFrame:
"""Transform small table including which revision the data pertains to and when it was updated."""returntransform_normalize(_core_nrelatb__transform_start, Normalizer().revisions)
In trying to integrate NREL ATB, I ran into an oddity that made it difficult to integrate the data from 2019 and 2020. Because of this we are not integrating these years of ATB. All of this exploration was based on this semi-cleaned asset: _core_nrelatb__transform_start.pkl.zip
There is a column called
core_metric_case
which is always either "Market" or "R&D". Then there is another column calledcore_metric_key
which is a composite (semi-)primary key column containing codes representing info stored in other columns in the data. The first character of thecore_metric_key
is always eitherR
orM
. We've called this first letter of thecore_metric_key
themystery_code
. We and other collaborators thought this corresponded to thecore_metric_case
. It does the ~75% of the time:I'll note that the
core_metric_key
seems to have changed structure over time - especially in 2023. Also, themystery_code
never deviated from thecore_metric_case
in 2023.This would be fine if the values in the
value
column did not vary based on themystery_code
(we could drop fully duplicate records w/o thismystery_code
or thecore_metric_key
). But the data does seem to value truly different values. Of the three data tables which are derived from the info in thevalue
column, two tables have real variability invalue
based on themystery_code
. The records that are variable bymystery_code
make up 12% of thecore_nrelatb__yearly_projections_by_scenario
table and 5% of thecore_nrelatb__yearly_rates_projections
table.The text was updated successfully, but these errors were encountered: