Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve interpreter and cache #1131

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Improve interpreter and cache #1131

wants to merge 4 commits into from

Conversation

remram44
Copy link
Member

Most of this is directions on paper right now, I'll try to add details here.

The idea is to move away from the current caching and execution model (instantiate every module in persistent pipeline according to signatures, recurse inside the Module themselves) to an external model that drives the execution as needed (this would allow #1060).

Modules would become very dumb execution functions with metadata for the interpreter (we can make them functions instead of classes, I'm thinking about versioning the package API as well for futureproofing).

The multiple levels of interpreters used for groups and subworkflows would disappear, modules would just be able to add more modules to the workflow during execution (which would unify with looping); should address #765. Streaming needs to be built into this, removing the need for passing around generators.

The cache would be used during the execution itself, which would allow more possibility than simply "cacheable" or "notcacheable". A module would output a cache signature along with its output, which would allow modules to be rechecked but still caching its downstream, or different modules to hit the same cache key (e.g. the DownloadFile module could just use a SHA-1 of the file as the cache key). Caching to disk could also be added here in time (#640).

from VT workflow

mark depth, insert depth bumping points (x -> [x])
(split workflow for execution on multiple targets)

for module in sink:
    exec(module)

exec(module):
    for upstream_module:
        exec(upstream_module)

    compute_sig(module_id + upstream)
    check cache
    execute (might submit more modules for execution, e.g. controlflow or group)
    record provenance
    add result to cache

TODO: more details, more code

@remram44
Copy link
Member Author

(execution engine is currently here: https://github.com/remram44/workflow-prototype)

@remram44 remram44 added this to the version 3.0 milestone Jan 6, 2016
This exposes the internal pipeline Module objects to the execution side,
which is very limiting and wrong.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant