Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Time' when using pipestat via pypiper #207

Open
nsheff opened this issue Feb 15, 2024 · 6 comments
Open

KeyError: 'Time' when using pipestat via pypiper #207

nsheff opened this issue Feb 15, 2024 · 6 comments

Comments

@nsheff
Copy link
Member

nsheff commented Feb 15, 2024

When I'm trying to switch from a normal pypiper pipeline to one that configures pipestat, I'm getting this error:

Traceback (most recent call last):
  File "/home/nsheff/code/seqcolapi/analysis/pipeline/add_to_seqcol_server.py", line 92, in <module>
    pm.stop_pipeline()
  File "/home/nsheff/.local/lib/python3.11/site-packages/pypiper/manager.py", line 2106, in stop_pipeline
    self.report_result("Time", elapsed_time_this_run, nolog=True)
  File "/home/nsheff/.local/lib/python3.11/site-packages/pypiper/manager.py", line 1616, in report_result
    reported_result = self.pipestat.report(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 99, in inner
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 571, in report
    schema=self.result_schemas[r],
           ~~~~~~~~~~~~~~~~~~~^^^
KeyError: 'Time'

I can't track this because I'm not doing anything related to Time. so it must be coming from pypiper or pipestat somehow.

@nsheff
Copy link
Member Author

nsheff commented Feb 15, 2024

One hint is this message:

These results exist for 'DEFAULT_SAMPLE_NAME': Time
These results exist for 'DEFAULT_SAMPLE_NAME': Success

It looks like there might be a bug somewhere with a constant that is getting stored as a string instead.

@nsheff
Copy link
Member Author

nsheff commented Feb 15, 2024

I think pipestat_sample_name is not being passed through to pipestat

@nsheff
Copy link
Member Author

nsheff commented Feb 15, 2024

actually I think it's pipestat_results_file that's not working correclty...

@nsheff
Copy link
Member Author

nsheff commented Feb 15, 2024

I figured it out.

Pypiper automatically adds results for Time and Success. If those aren't in your output schema, it fails. So you have to add this to the output schema:

  Time:
    type: "string"
    description: "Elapsed time for the pipeline run as reported by pypiper"
  Success:
    type: "string"
    description: "Timestamp for when the pipeline completed"

I think this is suboptimal, since I am not putting those in, they're automatic. Maybe pypiper should be the one adding them to the output schema, since it's the one reporting them automatically.

@nsheff
Copy link
Member Author

nsheff commented Feb 15, 2024

I made a more informative error message in pipestat to address this here: pepkit/pipestat@0d511b5

This at least solves the immediate issue, but going forward:

  • pypiper should add anything it uses into the schema on its own
  • so, pipestat, probably needs to make it easier to merge/update/combine schemas. right now you can only give it a file path, and that's it -- there's no way to set the schema programmatically, or update it, or whatever. so, first, the pipestat schema loading system needs to be more flexible, in order to allow pypiper to update the schema and add its parameters.

@donaldcampbelljr
Copy link
Member

Also confirmed this by adding the output_schema to the Pipelinemanager during the test_pipeline_manager.py test (I was initially surprised our tests didn't catch this):

        self.pp = pypiper.PipelineManager(
            "sample_pipeline", outfolder=self.OUTFOLDER, multi=True, pipestat_schema="/home/drc/GITHUB/pypiper/pypiper/tests/Data/sample_output_schema.yaml"
        )

It will indeed fail with a KeyError:
tests/pipeline_manager/test_pipeline_manager.py::PipelineManagerTests::test_me - KeyError: 'Time'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants