Skip to content
Rich Calaway edited this page Dec 13, 2018 · 28 revisions

checkpoint

Part of the Reproducible R Toolkit:

Examples

Creating a checkpoint project

Create a folder that you want your project in, OR go to a folder you already have a project in. Let's say the folder is called ~/myproject. Then run:

checkpoint(snapshotDate = "2015-01-01")

Refreshing the project

Every time you run the checkpoint() function in a project, the entire project gets scanned again for any R scripts and required packages.

To ensure all the package dependencies are installed, simply run checkpoint() again().

checkpoint(snapshotDate = "2014-09-17")

Using checkpoint

Initialize a checkpoint project by adding this code to the top of your script.

library(checkpoint)
checkpoint(snapshotDate = "2015-01-01") # Use desired snapshot date

When you rund this code, the checkpoint() function scans through your entire project and looks for libary() and require() statements.

For example, if you have this script in your project, checkpoint() will recognize that ggplot2 should be installed

library(ggplot2)
ggplot(mtcars, aes(mpg, cyl)) +
  geom_point()

How does checkpoint work?

Where does checkpoint store the checkpointed packages?

checkpoint stores packages in your home folder, i.e. ~/.checkpoint.

Every time you run checkpoint(), your files are scanned and all packages used are identified. These packages are then installed inside the checkpoint home folder. For example, if you have a checkpoint snapshot date of 2015-01-01, the function creates a new folder ~/.checkpoint/2015-01-15.

How does checkpoint know which packages to install?

Every time you run checkpoint() your project gets scanned for all R files, i.e. files with extensions .R and .Rnw. Specifically, we parse these script files for occurrences of library(...) and require(...) calls.

In addition, if you have knitr installed, we also scan rmarkdown files, i.e. files with extension .Rmd, .Rpres and .Rhtml. To scan these files, we first tangle the files then scan for library() and require() calls.

How does checkpoint differ from packrat?

Both checkpoint and packrat install packages required for a project to a local archive as they existed at a specified point in time. This allows specific package versions to be maintained over time and different users.

However, the packages differ fundamentally in how they go about their business.

checkpoint uses MRAN snapshots, daily snapshots of CRAN, to install packages. Since Revolution Analytics built the server-side MRAN solution, it means the workload on the user is very low. Simply include checkpoint(...) at the top of your script, and the checkpoint function automatically downloads all required packages. To share your script, simply publish or email your work.

  • Checkpoint is simple
  • Reproducibility from one script
  • Simple for recipients to reproduce results
  • Only allows use of CRAN packages versions that have been tested together
  • Relies on availability of MRAN

In contrast, packrat requires you to manage and publish all your packages. Thus sharing a packrat project requires you to copy and upload all the required packages to a public location, e.g. github.

  • Packrat is flexible and powerful
  • Supports non-CRAN packages (e.g. github)
  • Allows mix-and-matching package versions
  • Requires shipping all package source
  • Requires recipients to build packages from source

How does MRAN work?

The reproducible R toolkit (RRT) consists of checkpoint and the server-side checkpoint-server that manages the CRAN snapshots.

MRAN is the implementation of checkpoint-server.

See https://mran.microsoft.com/documents/rro/reproducibility#reproducibility for more information.

How to use checkpoint

How to refer to a specific snapshot?

Use the following code at the top of your script:

library(checkpoint)
checkpoint("2015-01-01")

How do projects share packages across checkpoints?

Since the identified packages are stored in ~/.checkpoint, different projects can share packages from the same date.

Can I use multiple snapshots in a script?

Yes, but note that that this will have the effect of downloading multiple complete sets of packages into your ~/checkpoint home folder - one set for each snapshot date.

Do I have to install packages?

No, the checkpoint() function does this automatically. This is a big benefit when reading and sharing scripts.

Maintenance

How do I remove .checkpoint folders?

You can manually delete any snapshot folder in your ~/.checkpoint home folder. This won’t harm any existing projects — the required package versions will simply be redownloaded next time the script is run.

Other frequently asked questions

How do I share my script?

Simply put your script in a shared location, e.g. github or gist, or email to your collaborators.

What about github packages?

checkpoint() only supports MRAN snapshots. This has the advantage of ensuring your packages have been tested to work together (by the daily CRAN build process).

If you wish to manually add specific packages, then take a look at packrat that allows you to manually manage packages.

How do I use a specific version of a package?

To ensure you have a specific version of a package, you have to find a snapshot date that contains this version of the package. Alternatively, use packrat instead of checkpoint.