Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues setting up PUDL dev environment #3612

Open
denimalpaca opened this issue May 6, 2024 · 6 comments
Open

Issues setting up PUDL dev environment #3612

denimalpaca opened this issue May 6, 2024 · 6 comments
Labels
bug Things that are just plain broken.

Comments

@denimalpaca
Copy link

Describe the issues

Here's a list of issues I had setting up the development environment from this guide, and why I think I had them:

  • Using miniforge to install mamba did not result in a conda environment with python=3.12, which is needed for certain PUDL packages

In Running the ETL Pipeline doc:

  • Running the alembic command errored when it needed to run a PUDL file; there is no indication in the doc that PUDL has to be installed to the local python env as a package (how I fixed the issue) or where/how to run the command if it is not installed.

When I ran the ferc_to_sqlite_fast DAG, I got the following task failure/error:

dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "ferc1_xbrl":

  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/execute_plan.py", line 282, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/execute_step.py", line 523, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/execute_step.py", line 202, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/execute_step.py", line 100, in _process_asset_results_to_events
    for user_event in user_event_sequence:
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/compute.py", line 208, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context):
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/compute.py", line 177, in _yield_compute_results
    for event in iterate_with_context(
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_utils/__init__.py", line 463, in iterate_with_context
    with context_fn():
  File "/Users/denimalpaca/miniforge3/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
RuntimeError: Found existing DB at /Users/denimalpaca/pudl_output/ferc1_xbrl.sqlite and clobber was set to False. Aborting.

  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_utils/__init__.py", line 465, in iterate_with_context
    next_output = next(iterator)
                  ^^^^^^^^^^^^^^
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 141, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
             ^^^^^^^^^^^^^^^^^^
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/denimalpaca/miniforge3/lib/python3.12/site-packages/pudl/extract/xbrl.py", line 82, in inner_op
    raise RuntimeError(

This was the only error in that run, and re-running the task didn't fix it.

Running the integration tests in the PUDL repo locally resulted in:
============================================== 23 passed, 4 skipped, 7 xfailed, 1 xpassed, 71 warnings, 62 errors in 2984.83s (0:49:44) ===============================================

Using the command: pytest test/integration/

A short snippet of the errors:

ERROR test/integration/glue_test.py::test_for_fk_validation_and_unmapped_ids[missing_plants_in_plants_ferc1] - AssertionError
ERROR test/integration/glue_test.py::test_for_fk_validation_and_unmapped_ids[missing_plants_in_plants_eia] - AssertionError
ERROR test/integration/glue_test.py::test_for_unmapped_ids_minus_one[check_for_unmmapped_plants_in_plants_ferc1] - AssertionError
ERROR test/integration/glue_test.py::test_for_unmapped_ids_minus_one[validate_utility_id_ferc1_in_utilities_ferc1_xbrl] - AssertionError
ERROR test/integration/glue_test.py::test_unmapped_utils_eia - AssertionError
ERROR test/integration/output_test.py::test_nuclear_fraction[gf_eia923-0.2-0.02] - AssertionError
ERROR test/integration/output_test.py::test_nuclear_fraction[mcoe_generators-0.2-0.02] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[pu_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[fuel_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plants_steam_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[fbp_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plants_all_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plants_hydro_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plants_pumped_storage_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plants_small_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[purchased_power_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_ferc1_outputs[plant_in_service_ferc1] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-bga_eia860-1.0-kwargs0] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-gens_eia860-1.0-kwargs1] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-own_eia860-1.0-kwargs2] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-plants_eia860-1.0-kwargs3] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-boil_eia860-1.0-kwargs4] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-pu_eia860-1.0-kwargs5] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-utils_eia860-1.0-kwargs6] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-bf_eia923-12.0-kwargs7] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-frc_eia923-12.0-kwargs8] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-gen_eia923-12.0-kwargs9] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-gen_fuel_by_generator_energy_source_eia923-12.0-kwargs10] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-gen_fuel_by_generator_eia923-12.0-kwargs11] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-gf_eia923-12.0-kwargs12] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-hr_by_unit-12.0-kwargs13] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-hr_by_gen-12.0-kwargs14] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-fuel_cost-12.0-kwargs15] - AssertionError
ERROR test/integration/output_test.py::test_eia_outputs[gens_eia860-capacity_factor-12.0-kwargs16] - AssertionError
ERROR test/integration/output_test.py::test_annual_eia_outputs[plant_parts_eia] - AssertionError
ERROR test/integration/output_test.py::test_annual_eia_outputs[ferc1_eia] - AssertionError
ERROR test/integration/output_test.py::test_annual_eia_outputs[gen_fuel_by_generator_energy_source_eia923] - AssertionError
ERROR test/integration/output_test.py::test_annual_eia_outputs[gen_fuel_by_generator_eia923] - AssertionError
ERROR test/integration/output_test.py::test_annual_eia_outputs[gen_fuel_by_generator_energy_source_owner_eia923] - AssertionError
ERROR test/integration/output_test.py::test_null_rows[mcoe_generators-0.9] - AssertionError
ERROR test/integration/output_test.py::test_outputs_by_table_suffix[eia861] - AssertionError
ERROR test/integration/output_test.py::test_outputs_by_table_suffix[ferc714] - AssertionError
ERROR test/integration/output_test.py::test_ferc714_outputs[out_ferc714__summarized_demand] - AssertionError
ERROR test/integration/output_test.py::test_ferc714_outputs[out_ferc714__respondents_with_fips] - AssertionError
ERROR test/integration/output_test.py::test_service_territory_outputs[out_eia861__yearly_balancing_authority_service_territory] - AssertionError
ERROR test/integration/output_test.py::test_service_territory_outputs[out_eia861__yearly_utility_service_territory] - AssertionError

Expected these all to pass (I think? I don't actually know if they were supposed to). Seems like it might just be an issue with how I materialized the data? Not really sure. I only ran the fast ETL, so maybe I need the full one?

Expected behavior

A clear and concise description of what you expected to happen, or what you expected the data to look like.

Software Environment?

  • Operating System: MacOS 14.4
  • Python version and distribution: Conda/Mamba Python 3.12
  • How did you install PUDL?
    • If you installed using git clone what branch are you using: forked from main
@denimalpaca denimalpaca added the bug Things that are just plain broken. label May 6, 2024
@zaneselvans
Copy link
Member

Using miniforge to install mamba did not result in a conda environment with python=3.12, which is needed for certain PUDL packages

Here are you referring to the base environment that's created when you install conda or mamba or to the pudl-dev environment? It's fine if the base environment isn't python 3.12 (Currently mine is 3.10.13). mamba should manage the python version within other environments it creates, and ought to have installed 3.12 in pudl-dev. Can you share what command(s) you used to create the pudl-dev environment? From the errors you're noting it sounds like maybe you didn't use make install-pudl

  • How did you run the ferc_to_sqlite_fast DAG?
  • Was there an existing ferc1_xbrl.sqlite in your pudl_output directory? If so, do you know how it was created?

Do you get the same test failures if you run make pytest-integration

@zaneselvans
Copy link
Member

zaneselvans commented May 6, 2024

I'm also getting the error about existing databases and clobber being False when I try to run the ferc_to_sqlite DAG from within the Dagster UI.

@bendnorman @jdangerx It seems odd and new that it would not be possible to run the FERC to SQLite DAGs if the databases already exist. The last time I ran those was mid-March. Has something changed since then? Is there a way to set clobber=True from within the Dagster UI?

@denimalpaca
Copy link
Author

denimalpaca commented May 6, 2024

make install-pudl

I definitely ran make install-pudl, I checked my shell logs and it was there. I went through the doc line by line. I realized my mamba init only added the appropriate conda initialization to my .bash_profile, and not to zsh. So just fixed up my zshrc and got the pudl-dev env actually working and it's the correct python version. Could be helpful to add a few lines to the doc of what the shell should look like after / how to fix this for zsh. As someone who hadn't used conda at all before I got pretty lost.

I ran the ferc_to_sqlite_fast DAG via the dagster UI at http://127.0.0.1:3000/. I just found the DAG and did a "run now". I did this just after creating the input and output directories, so there wasn't anything in them. I can try deleting everything in the output directory and re-running.
EDIT: Got this task to complete ok. I ran it twice before I posted this, the first error was during the general DAG run and there was an ssl timeout. When I re-ran it, the sqlite file must have been created, because then I got that error where clobber wasn't set to true. When I deleted the ferc1_xbrl.sqlite file in my outputs directory and re-run, it was successful.

Will try the make pytest-integration command now.
EDIT: This command produced 85 passes and 6 xfailed, not sure what xfailed is.

@zaneselvans
Copy link
Member

If you ran make install-pudl then I'm confused as to why you wouldn't have gotten a good Python 3.12 environment out of it. Maybe it's related to the shell init / conda setup issue? The provided shell commands for appending the conda stuff to your shell initialization files are too cryptic. We should explain that more.

We talked about the clobber thing a little internally this morning, and I think the simplest solution is to just have it always clobber. It only takes ~10min to regenerate all of the FERC DBs locally and we don't tend to run it very often, and the other solutions (manually deleting the files or futzing with the run configuration through the Dagster UI) both seem brittle / flaky.

@denimalpaca
Copy link
Author

After doing make install-pudl, the command output told me to run mamba activate pudl-dev. I was having trouble with that latter command because of the shell setup, so I was only on the base env. Once I was able to activate the pudl-env, I did get the correct environment.

@zaneselvans
Copy link
Member

Ahh, okay okay. So the shell setup stuff really was the disconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken.
Projects
Status: New
Development

No branches or pull requests

2 participants