Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce computation stages - validation, download, prepare, ... #135

Open
unode opened this issue Mar 4, 2020 · 1 comment
Open

Introduce computation stages - validation, download, prepare, ... #135

unode opened this issue Mar 4, 2020 · 1 comment

Comments

@unode
Copy link
Member

unode commented Mar 4, 2020

Currently NGLess uses two stages for execution. A first stage verifies that the script and output files are consistent (equivalent to --validate-only) and a second stage where computation happens if the first stage finishes successfully.

However, the current implementation performs downloads, indexing and computation during the same (second) stage.
If using the parallel module, this can lead to jobs waiting on each other for significant amounts of time. This happens during indexing and initialization of internal and external modules, as well as, during downloads, leading to failures or delays due to connectivity problems or slow networks speeds.

For example, mapping to hg19 only downloads and indexes the files when the map() step is reached for the first time.

This limitation often leads to workflows that follow a run one sample first and if it finishes run all others approach.

If implementing a staged execution, an ngless workflow could look like:

# (run once) Ensure ngless is correctly installed
ngless --check-install

# (run once) Check that the script is valid and inputs/outputs are as expected
ngless --validate-only script.ngl

# (run once/multiple) Download and index all dependencies (references, resources from internal modules, initialization of external modules, indexing, etc...)
ngless --ensure-dependencies script.ngl

# (run once/multiple) Interpret the script, possibly in parallel
ngless script.ngl 

An advantage of --ensure-dependencies is that resources could be downloaded, indexed, ... in parallel, something which currently happens sequentially.

Additionally, execution of script.ngl would have predictable behavior for a user regardless of being the first time the command is being executed.

This issue is also in line with #71 which proposes a setup phase for external modules. Such phase would also run during --ensure-dependencies.

@luispedro
Copy link
Member

There is a hacky way to do this, which is to do --subsample. Not a great solution in terms of UX, but it works for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants