Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing jobs on restart while postprocessing #354

Open
natefoo opened this issue Mar 21, 2024 · 0 comments
Open

Losing jobs on restart while postprocessing #354

natefoo opened this issue Mar 21, 2024 · 0 comments

Comments

@natefoo
Copy link
Member

natefoo commented Mar 21, 2024

I can't investigate this fully at the moment but I suspect this is possible because after a job has left the cluster, Pulsar:

  1. creates $staging_dir/$job_id/final_status with contents "complete",
  2. removes $persistence_dir/${manager}-active-jobs/$job_id,
  3. performs postprocessing (writing outputs back to Galaxy), and
  4. creates $staging_dir/$job_id/postprocessed.

Because $persistence_dir/${manager}-active-jobs/$job_id is removed before postprocessing completes, it would presumably not attempt to retry postprocessing after a restart.

EDIT: this definitely happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant