While renv can help capture the state of your R library at some point in time, there are still other aspects of the system that can influence the runtime behavior of your R application. In particular, the same R code can produce different results depending on:
- The operating system in use,
- The compiler flags used when R and packages are built,
- The LAPACK / BLAS system(s) in use,
- The versions of system libraries installed and in use,
And so on. Docker is a tool that can help solve this problem through the use of containers. Very roughly speaking, one can think of a container as a small, self-contained system within which different applications can be run. Using Docker, one can declaratively state how a container should be built (what operating system it should use, and what system software should be installed within), and use that system to run applications. (For more details, please see https://environments.rstudio.com/docker.)
Using Docker and renv together, one can then ensure that both the underlying system, alongside the required R packages, are fixed and constant for a particular application.
The main challenges in using Docker with renv are:
Ensuring that the renv cache is visible to Docker containers, and
Ensuring that required R package dependencies are available at runtime.
This vignette will assume you are already familiar with Docker; if you are not yet familiar with Docker, the Docker Documentation provides a thorough introduction. To learn more about using Docker to manage R environments, visit environments.rstudio.com.
We’ll discuss two strategies for using renv with Docker:
- Using renv to install packages when the Docker image is generated;
- Using renv to install packages when Docker containers are run.
We’ll also explore the pros and cons of each strategy.
Creating Docker images with renv
With Docker, Dockerfiles are used to define new images. Dockerfiles can be used to declaratively specify how a Docker image should be created. A Docker image captures the state of a machine at some point in time – e.g., a Linux operating system after downloading and installing R 4.5. Docker containers can be created using that image as a base, allowing different independent applications to run using the same pre-defined machine state.
First, you’ll need to get renv installed on your Docker image. For example, you could install the latest release of renv from CRAN:
RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))"
Alternatively, if you need to use the development version of renv, you could use:
RUN R -e "install.packages('renv', repos = 'https://rstudio.r-universe.dev')"
Next, we’ll copy renv.lock
into the container:
WORKDIR /project
COPY renv.lock renv.lock
Now, we renv::restore()
to install those packages. At
this stage, you’ll need to decide which of R’s library paths you’d like
to use for pacakge installation. (See ?.libPaths
for more
information.) There are a couple of options available:
Use the default library paths
This method is appropriate if you’d like these packages to be visible to all R processes launched using this image, and can be done via:
RUN R -e "renv::restore()"
Note that this method may fail if R’s default library paths are not on a writable volume in the Docker image. If this is the case, consider one of the alternatives below.
Use the default project library path
If you want to use renv’s default project-local library path, you’ll need to initialize the project within the Docker container as an renv project. This can be done with:
RUN R -s -e "renv::init(bare = TRUE)"
RUN R -s -e "renv::restore()"
Or, alternatively, if you already have a project autoloader + settings available – e.g. because you’re creating a Docker image from an existing renv project – you could use:
RUN mkdir -p renv
COPY .Rprofile .Rprofile
COPY renv/activate.R renv/activate.R
COPY renv/settings.json renv/settings.json
RUN R -s -e "renv::restore()"
Note that in this mode, the installed packages would only be visible
to R sessions launched using /project
as the working
directory. This will be the default behavior as long as
WORKDIR
is not changed, but it’s important to keep this in
mind.
Use a custom library path
If you’d like to fully customize the library path used, the simplest
approach is likely to use the RENV_PATHS_LIBRARY
environment variable. This mimics the above approach, but customizes the
library paths used by renv. For example:
ENV RENV_PATHS_LIBRARY=renv/library
RUN R -s -e "renv::init(bare = TRUE)"
RUN R -s -e "renv::restore()"
Alternatively, you could manage the library paths yourself via
.libPaths()
– see ?.libPaths
in R for more
inforamtion.
Speeding up package installations
The previously-described approaches are useful if you have multiple applications with identical package requirements. In this case, a single image containing this identical package library could serve as the parent image for several containerized applications.
However, renv::restore()
is slow – it needs to download
and install packages, which can take some time. Thus, some care is
required to efficiently make use of the renv cache for projects that
require:
Building an image multiple times (e.g., to debug the production application as source code is updated), or
Calling
renv::restore()
each time the container is run.
The former process can be sped up using multi-stage builds, the latter by dynamically provisioning R Libraries, as described below.
Multi-stage builds
For projects that require repeatedly building an image, multi-stage builds can be used to speed up the build process. With multi-stage builds, multiple FROM statements are used in the Dockerfile and files can be copied across build stages.
This approach can be leveraged to generate more efficient builds by dedicating a first stage build to package synchronization and a second stage build to copying files and executing code that may need to be updated often across builds (e.g., code that needs to be debugged in the container).
To implement a two stage build, the following code could be used as part of a Dockerfile.
FROM <parent-image> AS base
# intialize the project; assuming renv infrastructure available
WORKDIR /project
RUN mkdir -p renv
COPY renv.lock renv.lock
COPY .Rprofile .Rprofile
COPY renv/activate.R renv/activate.R
COPY renv/settings.dcf renv/settings.dcf
# change default location of cache to project folder
RUN mkdir renv/.cache
ENV RENV_PATHS_CACHE=renv/.cache
# restore
RUN R -s -e "renv::restore()"
The above code uses
FROM <parent-image> AS <name>
to name the first
stage of the build base
. Here,
<parent-image>
should be replaced with an appropriate
image name.
Subsequently, the code uses approach 2 (described above) to copy the
auto-loader to the project directory in the image. It additionally
creates the renv/.cache
directory that is to be used as the
renv cache.
The second stage of the build is defined by adding the following code to the same Dockerfile, below the previous code chunk.
FROM <parent-image>
WORKDIR /project
COPY --from=base /project .
# add commands that need to be debugged below
Here, <parent-image>
could be the same as the
parent image of base
, but does not have to be (see documentation
for more details).
The key line is the COPY
command, which specifies that
the contents of /project
directory from the
base
image are copied into the /project
directory of this image.
Any commands that will change frequently across builds could be
included below the COPY
command. If only this code
associated with the second stage build is updated then
renv::restore()
will not be called again at build time.
Instead, the layers associated with the base
image will be
loaded from Docker’s cache, thereby saving significant time in build
process.
In fact, renv::restore()
will only be called when the
base
image needs to be rebuilt (e.g., when changes are made
to renv.lock
). Docker’s cache system is generally good at
understanding the dependencies of images. However, if you find that the
base
image is not updating as expected, it is possible to
manually enforce a clean build by including the --no-cache
option in the call to docker build
.
Dynamically Provisioning R Libraries with renv
However, on occasion, one will have multiple applications built from
a single base image, but each application will have its own independent
R package requirements. In this case, rather than including the package
dependencies in the image itself, it would be preferable for each
container to provision its own library at runtime, based on that
application’s renv.lock
lockfile.
In effect, this is as simple as ensuring that
renv::restore()
happens at container runtime, rather than
image build time. However, on its own, renv::restore()
is
slow – it needs to download and install packages, which could take
prohibitively long if an application needs to be run repeatedly.
The renv package cache can be used to help ameliorate this issue. When the cache is enabled, whenever renv attempts to install or restore an R package, it first checks to see whether that package is already available within the renv cache. If it is, that instance of the package is linked into the project library. Otherwise, the package is first installed into the renv cache, and then that newly-installed copy is linked for use in the project.
In effect, if the renv cache is available, you should only need to pay the cost of package installation once – after that, the newly-installed package will be available for re-use across different projects. At the same time, each project’s library will remain independent and isolated from one another, so installing a package within one container won’t affect another container.
However, by default, each Docker container will have its own independent filesystem. Ideally, we’d like for all containers launched from a particular image to have access to the same renv cache. To accomplish this, we’ll have to tell each container to use an renv cache located on a shared mount.
In sum, if we’d like to allow for runtime provisioning of R package dependencies, we will need to ensure the renv cache is located on a shared volume, which is visible to any containers launched. We will accomplish this by:
Setting the
RENV_PATHS_CACHE
environment variable, to tell the instance of renv running in each container where the global cache lives;Telling Docker to mount some filesystem location from the host filesystem, at some location (
RENV_PATHS_CACHE_HOST
), to a container-specific location (RENV_PATHS_CACHE_CONTAINER
).
For example, if you had a container running a Shiny application:
# the location of the renv cache on the host machine
RENV_PATHS_CACHE_HOST=/opt/local/renv/cache
# where the cache should be mounted in the container
RENV_PATHS_CACHE_CONTAINER=/renv/cache
# run the container with the host cache mounted in the container
docker run --rm \
-e "RENV_PATHS_CACHE=${RENV_PATHS_CACHE_CONTAINER}" \
-v "${RENV_PATHS_CACHE_HOST}:${RENV_PATHS_CACHE_CONTAINER}" \
-p 14618:14618 \
R -s -e 'renv::restore(); shiny::runApp(host = "0.0.0.0", port = 14618)'
Note that the invocation above assumes that the project has already
been initialized either via calling renv::init()
or by
copying the requisite renv
project infrastructure. With
this, any calls to renv APIs within the created docker container will
have access to the mounted cache. The first time you run a container,
renv will likely need to populate the cache, and so some time will be
spent downloading and installing the required packages. Subsequent runs
will be much faster, as renv will be able to reuse the global package
cache.
The primary downside with this approach compared to the image-based approach is that it requires you to modify how containers are created, and requires a bit of extra orchestration in how containers are launched. However, once the renv cache is active, newly-created containers will launch very quickly, and a single image can then be used as a base for a myriad of different containers and applications, each with their own independent package dependencies.
Handling the renv autoloader
When R
is launched within a project folder, the renv
auto-loader (if present) will attempt to download and install renv into
the project library if it’s not available. Depending on how your Docker
container is configured, this could fail. For example:
Error installing renv:
======================
ERROR: unable to create '/usr/local/pipe/renv/library/master/R-4.0/x86_64-pc-linux-gnu/renv'
Warning messages:
1: In system2(r, args, stdout = TRUE, stderr = TRUE) :
running command ''/usr/lib/R/bin/R' --vanilla CMD INSTALL -l 'renv/library/master/R-4.0/x86_64-pc-linux-gnu' '/tmp/RtmpwM7ooh/renv_0.12.2.tar.gz' 2>&1' had status 1
2: Failed to find an renv installation: the project will not be loaded.
Use `renv::activate()` to re-initialize the project.
Bootstrapping renv into the project library might be unnecessary for
you. If that is the case, then you can avoid this behavior by launching
R with the --vanilla
flag set; for example:
R --vanilla -s -e 'renv::restore()'