December 16, 2014

What is Packrat?

Packrat is an R package that helps you manage your project's R package dependencies in a way that is:

  • Reproducible

    Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

  • Isolated

    Installing a new or updated package for one project won't break your other projects, and vice versa. That's because packrat gives each project its own private package library.

  • Portable

    Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.

Reproducibility

Packrat helps you to save and preserve the environment in which an R project or analysis is run in.

Ever finished an analysis, come back a year later, and asked:

  • Who wrote this unreadable mess of code? (oops, it was me)
  • Why is lattice throwing an error when I make figures now? Apparently its API changed? I swear it didn't do that before…
  • Okay, I re-ran the analysis, but now nlme is complaining about model convergence: I swear this didn't happen before…
  • What happened to that R code we got from our collaborators? I swear it's in my e-mail somewhere…

Reproducibility

Packrat helps you solve this problem by:

  • Ensuring package sources are stored alongside your project, and
  • Tracking package versions so that, if you need to roll back a package for some reason, you can do so.

Packrat lets you build a sandbox in which your R programs can run, and helps you ensure that sandbox can persist over time.

Isolation

With the current state of the world, when you install an R package, it gets installed into a global library – all projects on your system have access to that same library …

… but only one version of a particular package can be installed at a time.

This is problematic if you have older projects that, for example, depend on older versions of dplyr, but want to leverage features of newer versions of dplyr for other projects.

We need some mechanism of isolating projects, and their dependencies, from each other – and Packrat helps us do this.

Isolation

Packrat gives each project its own private packrat library.

A library is just a folder of R packages. When you load a package with e.g. library(dplyr), R will attempt to load dplyr from your user, or system, library.

  • Useful analogy: R packages are 'books'; they live in a 'library' and hence you check them out with the library() function.

If you use Packrat to manage an R project, it will tell R to use and load packages from the project's private packrat library instead.

This also means that calls to install.packages(), devtools::install_github(), biocLite() and others will install into the private library by default!

Portability

Because Packrat collects and tracks all project dependencies, it ensures that the environment used for a particular analysis is portable across different systems.

If you are working with collaborators, you can ensure that everyone working on a Packrat project is using the same version of any R packages your project depends on.

Using Packrat

Main Packrat Verbs

There are two key constructs to understand to use Packrat effectively:

  • The snapshot of your project's R dependencies, and
  • The private library powering your project.

Packrat has two main verbs for interacting with these two components:

packrat::snapshot() records the package versions used by a project and downloads their source code for storage with the project,

packrat::restore() applies the previous snapshot to a directory (building packages from source as necessary).

Demo

These concepts are much more easily explored interactively – so let's switch gears and start using Packrat!

Packrat Fundamentals

packrat::init()

Create a packrat project within a directory, giving the project its own private package library.

packrat::snapshot()

Finds the packages in use in the project and stores a list of those packages, their current versions, and their source code.

packrat::restore()

Restore the directory to the last snapshotted state (building packages from source as necessary).

Initializing a Project

packrat::init()
Adding these packages to packrat:
            _          
    packrat   0.4.2-1

Fetching sources for packrat (0.4.2-1) ... OK (GitHub)
Snapshot written to '~/projects/reshape/packrat/packrat.lock'
Installing packrat (0.4.2-1) ... OK (built source)
Initialization complete!

Snapshotting Installed Packages

packrat::snapshot()
Adding these packages to packrat:
             _       
    plyr       1.8.1 
    Rcpp       0.11.2
    reshape2   1.4   
    stringr    0.6.2 

Fetching sources for plyr (1.8.1) ... OK (CRAN current)
Fetching sources for Rcpp (0.11.2) ... OK (CRAN current)
Fetching sources for reshape2 (1.4) ... OK (CRAN current)
Fetching sources for stringr (0.6.2) ... OK (CRAN current)
Snapshot written to '~/projects/reshape/packrat/packrat.lock'

Restoring the State of the Library

packrat::restore()
Installing Rcpp (0.11.2) ... OK (downloaded binary)
Installing stringr (0.6.2) ... OK (downloaded binary)
Installing plyr (1.8.1) ... OK (downloaded binary)
Installing reshape2 (1.4) ... OK (downloaded binary)

Updating a Package from Github

packrat::install_github("RcppCore/Rcpp")
packrat::snapshot()
Upgrading these packages already present in packrat:
             from         to
    Rcpp   0.11.2   0.11.2.1

Snapshot written to '~/projects/reshape/packrat/packrat.lock'
packrat::restore()
Installing Rcpp (0.11.2.1) ... OK (built source)

Bundling and Unbundling

packrat::bundle()
The packrat project has been bundled at:
- "~/projects/reshape/packrat/bundles/reshape-2014-06-24.tar.gz"
packrat::unbundle("reshape-2014-06-24.tar.gz", where = "~/Desktop")
- Untarring 'reshape-2014-06-24.tar.gz' in directory '~/Desktop'...
- Restoring project library...
Installing packrat (0.4.2-1) ... OK (built source)
Installing Rcpp (0.11.2.1) ... OK (built source)
Installing stringr (0.6.2) ... OK (downloaded binary)
Installing plyr (1.8.1) ... OK (downloaded binary)
Installing reshape2 (1.4) ... OK (downloaded binary)
Done! The project has been unbundled and restored at:
- "~/Desktop/reshape"

Packrat Project Options

packrat::opts$auto.snapshot()

Perform automatic snapshots when your library is updated.

packrat::opts$external.packages()

Specify packages to load from the user library. This breaks the project's isolation, but can be useful in many circumstances – e.g. for large BioConductor annotation packages.

packrat::opts$local.repos()

Paths to local 'repositories'; i.e., folders containing R package sources.

packrat::opts$use.cache()

Cache installed packages for use across projects? This allows multiple projects on one machine to 'share' packages of the same version, without breaking isolation – and greatly speeds up packrat::restore().

Packrat and Version Control

Packrat Objectives

  • Isolated, portable, and reproducible environment for R projects

  • Capture all source code required to reproduce configurations

  • Flexible and easy to use solution to the problem of reproducibility:
    • "One button" snapshot/restore
    • Simple and convenient archiving (bundle/unbundle)
    • Optional integration with version control
  • Foundation for even higher fidelity reproducibility
    • e.g. Creating VMs or Docker containers for projects

Thanks for Joining!