Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Build Failure 2024-04-27 #3593

Open
zaneselvans opened this issue Apr 27, 2024 · 2 comments
Open

Nightly Build Failure 2024-04-27 #3593

zaneselvans opened this issue Apr 27, 2024 · 2 comments
Assignees
Labels
bug Things that are just plain broken. gridlab Work related to open modeling input data integration funded/coordinated by GridLab nightly-builds Anything having to do with nightly builds or continuous deployment. nrelatb NREL's Annual Technology Baseline data

Comments

@zaneselvans
Copy link
Member

Overview

  • The etl_full.yml settings for the NREL ATB included the years 2019 and 2020, which aren't yet working (See NREL ATB data non-unique in 2019 (mostly) and 2020 #3576) and Pydantic's validation of the settings correctly failed. I went ahead and removed those years from the settings file on main so we can get another attempted build tonight.
  • However, this failure doesn't happen locally when I try to run the full ETL with the old settings, which is weird.
  • While investigating this I was confused by the NREL ATB extraction, which doesn't seem to make any use of the settings or datastore. So maybe this is only working because it's relying on defaults that aren't informed by the ETL settings at all?
  • The raw_nrelatb__data asset claims to require the datastore and dataset_settings resources, but doesn't actually make use of them.
  • The NREL ATB Extractor claims to require a Datastore as input, but doesn't receive one.
  • But looking at the other tabular extractors, they also don't seem to use any resources (even though they obviously must) so maybe there is a bunch of magic happening in the background? Can we document what is going on?
  • It looks like there's a bit of stale documentation in the extraction system, with a mix of references to Excel and CSV files in places where they are not appropriate.
class Extractor(ParquetExtractor):
    """Extractor for NREL ATB."""

    def __init__(self, *args, **kwargs):
        """Initialize the module.

        Args:
            ds (:class:datastore.Datastore): Initialized datastore.
        """
        self.METADATA = GenericMetadata("nrelatb")
        super().__init__(*args, **kwargs)


raw_nrelatb__all_dfs = raw_df_factory(Extractor, name="nrelatb")


@asset(
    required_resource_keys={"datastore", "dataset_settings"},
)
def raw_nrelatb__data(raw_nrelatb__all_dfs):
    """Extract raw NREL ATB data from annual parquet files to one dataframe.

    Returns:
        An extracted NREL ATB dataframe.
    """
    return Output(value=raw_nrelatb__all_dfs["data"])
@zaneselvans zaneselvans added bug Things that are just plain broken. nightly-builds Anything having to do with nightly builds or continuous deployment. gridlab Work related to open modeling input data integration funded/coordinated by GridLab nrelatb NREL's Annual Technology Baseline data labels Apr 27, 2024
@cmgosnell cmgosnell self-assigned this Apr 29, 2024
@cmgosnell
Copy link
Member

cmgosnell commented Apr 29, 2024

ty for catching the non-working partitions in the full settings! I'm also confused why the validations didn't fail for me locally. after changing the working partitions in sources i was able to re-run the full extraction and only get the working years. that's weird for sure.

A lot of the magic is happening via extract.extractor.raw_df_factory which runs extract.extractor.partition_extractor_factory which uses the datastore and the dataset_settings. I was mirroring the eia 176 extract which required those two as inputs into the asset but doesn't pass them around - but instead accesses them within raw_df_factory.

I agree in general that the extractor setup needs some documentation cleanup and maybe some higher level explanation somewhere.

@jdangerx
Copy link
Member

jdangerx commented May 6, 2024

Tangible outcome here is:

  • replicate being able to run ATB with bogus settings, then figure out why the bogus settings aren't breaking the ATB run.

should have failed on import but that wasn't happening.

@jdangerx jdangerx self-assigned this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken. gridlab Work related to open modeling input data integration funded/coordinated by GridLab nightly-builds Anything having to do with nightly builds or continuous deployment. nrelatb NREL's Annual Technology Baseline data
Projects
Status: Backlog
Development

No branches or pull requests

3 participants