Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resistance run fails on Kive if the input files have already been purged #921

Open
CBeelen opened this issue Jan 25, 2023 · 0 comments
Open

Comments

@CBeelen
Copy link
Contributor

CBeelen commented Jan 25, 2023

In the case where either the main or the midi sample finishes rather quickly and the other takes a long time to finish, it can happen that the output files generated by the faster run have already been purged when the slower run finishes. The resistance run is only started once both runs have finished, and it will then fail to find the faster run's output files which have already been cleaned up.

This happened for sample 90542B-RELOAD-HCV_S83: the de novo MIDI sample finished in a reasonable amount of time, but the main sample took about a month to assemble. When it was finally finished, the MIDI run's results had already been purged, and Kive failed to find its amino.csv, with this message:
ValueError: Dataset has no dataset_file or external_path.

In the cases where subsequent runs need a previous run's results as inputs, we should check whether the previous run's results are still around. If this is not the case, the easiest solution would be to just re-start the run whose results have already been purged. This is the case for the resistance and proviral runs, they need input files from the main and de novo pipelines.

We should re-start and check for all samples in a re-try loop with a sensible limit of retries, otherwise we could get caught in a loop of re-running the main and midi sample.

Other possible solutions could be to download the input files from raw_data and to check that their checksum is what we expect, or to mark the output files that are still required for subsequent runs with an expiry date or a keep-alive to prevent them from being cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant