Improve interpreter and cache #1131

remram44 · 2015-09-23T17:55:28Z

Most of this is directions on paper right now, I'll try to add details here.

The idea is to move away from the current caching and execution model (instantiate every module in persistent pipeline according to signatures, recurse inside the Module themselves) to an external model that drives the execution as needed (this would allow #1060).

Modules would become very dumb execution functions with metadata for the interpreter (we can make them functions instead of classes, I'm thinking about versioning the package API as well for futureproofing).

The multiple levels of interpreters used for groups and subworkflows would disappear, modules would just be able to add more modules to the workflow during execution (which would unify with looping); should address #765. Streaming needs to be built into this, removing the need for passing around generators.

The cache would be used during the execution itself, which would allow more possibility than simply "cacheable" or "notcacheable". A module would output a cache signature along with its output, which would allow modules to be rechecked but still caching its downstream, or different modules to hit the same cache key (e.g. the DownloadFile module could just use a SHA-1 of the file as the cache key). Caching to disk could also be added here in time (#640).

from VT workflow

mark depth, insert depth bumping points (x -> [x])
(split workflow for execution on multiple targets)

for module in sink:
    exec(module)

exec(module):
    for upstream_module:
        exec(upstream_module)

    compute_sig(module_id + upstream)
    check cache
    execute (might submit more modules for execution, e.g. controlflow or group)
    record provenance
    add result to cache

TODO: more details, more code

remram44 · 2015-11-25T16:07:57Z

(execution engine is currently here: https://github.com/remram44/workflow-prototype)

This exposes the internal pipeline Module objects to the execution side, which is very limiting and wrong.

remram44 added T-enhancement S-minor C-core C-interpreter labels Sep 23, 2015

remram44 self-assigned this Sep 23, 2015

remram44 force-pushed the new-interpreter branch 6 times, most recently from 3fab378 to 5bc9251 Compare October 2, 2015 20:08

remram44 force-pushed the new-interpreter branch from c0d2d48 to 8f01c47 Compare October 23, 2015 16:35

remram44 force-pushed the new-interpreter branch from 8f01c47 to c17d3f0 Compare November 5, 2015 19:11

remram44 added this to the version 3.0 milestone Jan 6, 2016

remram44 added 3 commits March 21, 2016 17:19

Remove need for transfer_attrs()

ff5046c

This exposes the internal pipeline Module objects to the execution side, which is very limiting and wrong.

Remove all traces of noncached

4eb99db

Make CachedInterpreter the Interpreter

a2a45e4

remram44 force-pushed the new-interpreter branch from c17d3f0 to c228387 Compare March 21, 2016 21:37

WIP

15cb0e1

remram44 force-pushed the new-interpreter branch from c228387 to 15cb0e1 Compare March 21, 2016 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve interpreter and cache #1131

Improve interpreter and cache #1131

remram44 commented Sep 23, 2015

remram44 commented Nov 25, 2015

Improve interpreter and cache #1131

Are you sure you want to change the base?

Improve interpreter and cache #1131

Conversation

remram44 commented Sep 23, 2015

remram44 commented Nov 25, 2015