Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: serial transformation representation #18

Open
vreuter opened this issue Sep 2, 2017 · 1 comment
Open

Idea: serial transformation representation #18

vreuter opened this issue Sep 2, 2017 · 1 comment

Comments

@vreuter
Copy link
Member

vreuter commented Sep 2, 2017

Maybe sort of pie-in-the-sky, but I think It'd be cool to be able to provide a function that takes a list of instructions and interprets it as a sequential series of transformations from one cache to another, like Spark does with RDDs. The function could work backward from the end of the list, determining the filepath for the cache to be created or loaded and seeing if it exists. recreate would be used as-is but would only apply to the "leaf" cache, list elements would just be instructions, not entire bundles of simpleCache arguments. The function would backtrack until it hit an existing cache, then sequentially execute the instructions from there, generating the intermediate cache(s). I've found myself wanting to reuse caches between scripts, which forces a tradeoff. Either duplicate the code used to create it, or invoke loadCaches and lose the create-if-needed benefit of simpleCache.

@nsheff
Copy link
Member

nsheff commented Sep 2, 2017

it's not exactly the same thing, but did you see the buildDir option?

#' You should pass a bracketed R code snippet like `{ rnorm(500) }` as the
#' instruction, and simpleCache will create the object. Alternatively, if the
#' code to create the cache is large, you can put an R script called object.R in
#' the RBUILD.DIR (the name of the file *must* match the name of the object it
#' creates *exactly*). If you don't provide an instruction, the function sources
#' RBUILD.DIR/object.R and caches the result as the object. This source file
#' *must* create an object with the same name of the object. If you already have

but practically what I do in this situation is order the scripts and then just provide the instruction in the main script. I think this is simpler than using the buildDir, which I don't use anymore. Then I just use loadCaches on later scripts.

So I have cache generating scripts and then cache using scripts, for some things, and that seems to work OK.

But I think the order of transformations on a cache is interesting but it seems like a different issue than avoiding duplicating cache creation code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants