Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Before I Sleep: Benefits of a function-based diet (The {drake} post) #10

Open
utterances-bot opened this issue Dec 23, 2020 · 3 comments

Comments

@utterances-bot
Copy link

Before I Sleep: Benefits of a function-based diet (The {drake} post)

https://milesmcbain.com/posts/the-drake-post/

Copy link

wonderful! thanks for sharing.

@dewoller
Copy link

dewoller commented Mar 18, 2021

I can't believe that Benefits of a function-based diet was published less than a year ago. It completely took over my drake practice, for the better. I'm a fan boy!

I am about to evangelise my department to this practice, and I note that targets has 'superceded' drake. I can certainly see the advantages of targets package, but I am interested in your take . I note that William Landau has taken on board much of your thinking, to his advantage.

My questions are: have you moved to /targets/ for your new projects? How do the conflicted and dotenv packages fit into your workflow? What is your advice for a new drake/targets shop, and for the evangelisation task in general?

@MilesMcBain
Copy link
Owner

Thanks @dewoller, I am glad the approach has proven fruitful!

I use {targets} now for all my new projects. I've ported some of our legacy projects over - since it turns out this is quite painless to do.

Thanks to {tarchetypes} an almost identical workflow with {targets} to what I had with {dflow} is possible. See {tflow} for my latest project template. I am still using {conflicted} and {dotenv} as before.

There are 3 big advantages I have seen with {targets}

  1. The debugging workflow with saved workspaces is quite nice.
  2. The way the dynamic branching stuff works feels much more straight forward, and is now possible to debug thanks to 1.
  3. The annoying 'repacking large object' issue that sometimes arose when caching large objects and caused major plan slowness is gone. {targets} uses a different serialisation format.

On the flip side, one thing I have found slightly annoying is the way input file dependencies are supposed to work. You're expected to declare them all up front in the plan, whereas with {drake} you could call file_in in nested functions. It was a handy way of creating stubs or placeholder functions that used temporary data to be properly plumbed in later.

The evangelism task wasn't too hard for me because I had a very keen 'first follower' who had been in the team a long time and had some clout.

Like I said in the post we were having problems with reproducing work 5 minutes & 5 meters away - the dreaded works on my machine syndrome. I was able to show how some of this was being created by people accumulating stale state in their R sessions and that was leading to weird stuff. The explicit package dependencies and workflow dependency graph to be run in a fresh session every time offered a robust solution to this issue.

Your team might not have that issue, but I can offer some general advice: When introducing {targets} or {drake} go with a really basic set of features. Simple static dependency graphs offer nice productivity and reproducibility gains. Bring in parallelism, dynamic branching, custom triggers, and other advanced features later once people are sold on the workflow and have some comfort with the tooling.

Good luck! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants