3  Tidying and profiling R code

R code which is worth preserving in a package and perhaps making available for others to use is worth documenting, tidying up and perhaps optimizing. The last two of these activities are the subject of this chapter.

3.1 Tidying R code

R treats function code loaded from packages and code entered by users differently. By default code entered by users has the source code stored internally, and when the function is listed, the original source is reproduced. Loading code from a package (by default) discards the source code, and the function listing is re-created from the parse tree of the function.

Normally keeping the source code is a good idea, and in particular it avoids comments being removed from the source. However, we can make use of the ability to re-create a function listing from its parse tree to produce a tidy version of the function, for example with consistent indentation and spaces around operators. If the original source does not follow the standard format this tidied version can be much easier to read.

We can subvert the keeping of source in two ways.

  1. The option keep.source can be set to FALSE before the code is loaded into R.
  2. The stored source code can be removed by calling the removeSource() function, for example by r myfun <- removeSource(myfun)

In each case if we then list the function we will get the standard layout.

Suppose we have a file of functions myfuns.R that we want to tidy up. Create a file tidy.R containing

source("myfuns.R", keep.source = FALSE)
dump(ls(all.names = TRUE), file = "new.myfuns.R")

and run R with this as the source file, for example by R --vanilla < tidy.R or by pasting into an R session. Then the file new.myfuns.R will contain the functions in alphabetical order in the standard layout. Warning: comments in your functions will be lost.

The standard format provides a good starting point for further tidying. Although the deparsing cannot do so, we recommend the consistent use of the preferred assignment operator <- (rather than =) for assignment. Many package authors use a version of Emacs (on a Unix-alike or Windows) to edit R code, using the ESS[S] mode of the ESS Emacs package. See section ‘R coding standards’ in the ‘R Internals’ manual for style options within the ESS[S] mode recommended for the source code of R itself.

3.2 Profiling R code for speed

It is possible to profile R code on Windows and most1 Unix-alike versions of R.

1 R has to be built to enable this, but the option --enable-R-profiling is the default.

2 For Unix-alikes by default these are intervals of CPU time, and for Windows of elapsed (‘wall-clock’) time. As from R 4.4.0, elapsed time is optional on Unix-alikes

The command Rprof is used to control profiling, and its help page can be consulted for full details. Profiling works by recording at fixed intervals2 (by default every 20 msecs) which line in which R function is being used, and recording the results in a file (default Rprof.out in the working directory). Then the function summaryRprof or the command-line utility R CMD Rprof Rprof.out can be used to summarize the activity.

As an example, consider the following code (from Venables & Ripley, 2002, pp. 225–6).

library(MASS); library(boot)
storm.fm <- nls(Time ~ b*Viscosity/(Wt - c), stormer,
                start = c(b=30.401, c=2.2183))
st <- cbind(stormer, fit=fitted(storm.fm))
storm.bf <- function(rs, i) {
    st$Time <-  st$fit + rs[i]
    tmp <- nls(Time ~ (b * Viscosity)/(Wt - c), st,
               start = coef(storm.fm))
    tmp$m$getAllPars()
}
rs <- scale(resid(storm.fm), scale = FALSE) # remove the mean
Rprof("boot.out")
storm.boot <- boot(rs, storm.bf, R = 4999) # slow enough to profile
Rprof(NULL)

Having run this we can summarize the results by

R CMD Rprof boot.out

Each sample represents 0.02 seconds.
Total run time: 22.52 seconds.

Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.

   %       total       %        self
 total    seconds     self    seconds    name
 100.0     25.22       0.2      0.04     "boot"
  99.8     25.18       0.6      0.16     "statistic"
  96.3     24.30       4.0      1.02     "nls"
  33.9      8.56       2.2      0.56     "<Anonymous>"
  32.4      8.18       1.4      0.36     "eval"
  31.8      8.02       1.4      0.34     ".Call"
  28.6      7.22       0.0      0.00     "eval.parent"
  28.5      7.18       0.3      0.08     "model.frame"
  28.1      7.10       3.5      0.88     "model.frame.default"
  17.4      4.38       0.7      0.18     "sapply"
  15.0      3.78       3.2      0.80     "nlsModel"
  12.5      3.16       1.8      0.46     "lapply"
  12.3      3.10       2.7      0.68     "assign"
 ...

   %        self        %      total
  self    seconds     total   seconds    name
   5.7      1.44       7.5      1.88     "inherits"
   4.0      1.02      96.3     24.30     "nls"
   3.6      0.92       3.6      0.92     "$"
   3.5      0.88      28.1      7.10     "model.frame.default"
   3.2      0.80      15.0      3.78     "nlsModel"
   2.8      0.70       9.8      2.46     "qr.coef"
   2.7      0.68      12.3      3.10     "assign"
   2.5      0.64       2.5      0.64     ".Fortran"
   2.5      0.62       7.1      1.80     "qr.default"
   2.2      0.56      33.9      8.56     "<Anonymous>"
   2.1      0.54       5.9      1.48     "unlist"
   2.1      0.52       7.9      2.00     "FUN"
  ...

This often produces surprising results and can be used to identify bottlenecks or pieces of R code that could benefit from being replaced by compiled code.

Two warnings: profiling does impose a small performance penalty, and the output files can be very large if long runs are profiled at the default sampling interval.

Profiling short runs can sometimes give misleading results. R from time to time performs garbage collection to reclaim unused memory, and this takes an appreciable amount of time which profiling will charge to whichever function happens to provoke it. It may be useful to compare profiling code immediately after a call to gc() with a profiling run without a preceding call to gc.

More detailed analysis of the output can be achieved by the tools in the CRAN packages proftools and profr: in particular these allow call graphs to be studied.

3.3 Profiling R code for memory use

Measuring memory use in R code is useful either when the code takes more memory than is conveniently available or when memory allocation and copying of objects is responsible for slow code. There are three ways to profile memory use over time in R code. The second and third require R to have been compiled with --enable-memory-profiling, which is not the default, but is currently used for the macOS and Windows binary distributions. All can be misleading, for different reasons.

In understanding the memory profiles it is useful to know a little more about R’s memory allocation. Looking at the results of gc() shows a division of memory into Vcells used to store the contents of vectors and Ncells used to store everything else, including all the administrative overhead for vectors such as type and length information. In fact the vector contents are divided into two pools. Memory for small vectors (by default 128 bytes or less) is obtained in large chunks and then parcelled out by R; memory for larger vectors is obtained directly from the operating system.

Some memory allocation is obvious in interpreted code, for example,

y <- x + 1

allocates memory for a new vector y. Other memory allocation is less obvious and occurs because R is forced to make good on its promise of ‘call-by-value’ argument passing. When an argument is passed to a function it is not immediately copied. Copying occurs (if necessary) only when the argument is modified. This can lead to surprising memory use. For example, in the ‘survey’ package we have

print.svycoxph <- function (x, ...)
{
    print(x$survey.design, varnames = FALSE, design.summaries = FALSE, ...)
    x$call <- x$printcall
    NextMethod()
}

It may not be obvious that the assignment to x$call will cause the entire object x to be copied. This copying to preserve the call-by-value illusion is usually done by the internal C function duplicate.

The main reason that memory-use profiling is difficult is garbage collection. Memory is allocated at well-defined times in an R program, but is freed whenever the garbage collector happens to run.

3.3.1 Memory statistics from Rprof

The sampling profiler Rprof described in the previous section can be given the option memory.profiling=TRUE. It then writes out the total R memory allocation in small vectors, large vectors, and cons cells or nodes at each sampling interval. It also writes out the number of calls to the internal function duplicate, which is called to copy R objects. summaryRprof provides summaries of this information. The main reason that this can be misleading is that the memory use is attributed to the function running at the end of the sampling interval. A second reason is that garbage collection can make the amount of memory in use decrease, so a function appears to use little memory. Running under gctorture helps with both problems: it slows down the code to effectively increase the sampling frequency and it makes each garbage collection release a smaller amount of memory.

3.3.2 Tracking memory allocations

The second method of memory profiling uses a memory-allocation profiler, Rprofmem(), which writes out a stack trace to an output file every time a large vector is allocated (with a user-specified threshold for ‘large’) or a new page of memory is allocated for the R heap. Summary functions for this output are still being designed.

Running the example from the previous section with

> Rprofmem("boot.memprof",threshold=1000)
> storm.boot <- boot(rs, storm.bf, R = 4999)
> Rprofmem(NULL)

shows that apart from some initial and final work in boot there are no vector allocations over 1000 bytes.

3.3.3 Tracing copies of an object

The third method of memory profiling involves tracing copies made of a specific (presumably large) R object. Calling tracemem on an object marks it so that a message is printed to standard output when the object is copied via duplicate or coercion to another type, or when a new object of the same size is created in arithmetic operations. The main reason that this can be misleading is that copying of subsets or components of an object is not tracked. It may be helpful to use tracemem on these components.

In the example above we can run tracemem on the data frame st

> tracemem(st)
[1] "<0x9abd5e0>"
> storm.boot <- boot(rs, storm.bf, R = 4)
memtrace[0x9abd5e0->0x92a6d08]: statistic boot
memtrace[0x92a6d08->0x92a6d80]: $<-.data.frame $<- statistic boot
memtrace[0x92a6d80->0x92a6df8]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x9271318]: statistic boot
memtrace[0x9271318->0x9271390]: $<-.data.frame $<- statistic boot
memtrace[0x9271390->0x9271408]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x914f558]: statistic boot
memtrace[0x914f558->0x914f5f8]: $<-.data.frame $<- statistic boot
memtrace[0x914f5f8->0x914f670]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x972cbf0]: statistic boot
memtrace[0x972cbf0->0x972cc68]: $<-.data.frame $<- statistic boot
memtrace[0x972cc68->0x972cd08]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x98ead98]: statistic boot
memtrace[0x98ead98->0x98eae10]: $<-.data.frame $<- statistic boot
memtrace[0x98eae10->0x98eae88]: $<-.data.frame $<- statistic boot

The object is duplicated fifteen times, three times for each of the R+1 calls to storm.bf. This is surprising, since none of the duplications happen inside nls. Stepping through storm.bf in the debugger shows that all three happen in the line

st$Time <- st$fit + rs[i]

Data frames are slower than matrices and this is an example of why. Using tracemem(st$Viscosity) does not reveal any additional copying.

3.4 Profiling compiled code

Profiling compiled code is highly system-specific, but this section contains some hints gleaned from various R users. Some methods need to be different for a compiled executable and for dynamic/shared libraries/objects as used by R packages.

This chapter is based on reports from users and the information may not be current.

3.4.1 Profiling on Linux

Options include using sprof for a shared object, and oprofile (see https://oprofile.sourceforge.io/news/) and perf (see https://perf.wiki.kernel.org/index.php/Tutorial) for any executable or shared object. These seem less widely supplied than they used to be. There is also ‘Google Performance Tools’, also known as gperftools or google-perftools.

All of these work best when R and any packages have been built with debugging symbols.

3.4.2 perf

This seems the most widely distributed tool. Here is an example on x86_64 Linux using R 4.3.1 built with LTO.

At its simplest

perf record R -f tests/Examples/stats-Ex.R
perf report --sort=dso
perf report --sort=srcfile
rm perf.data*

The first report is

  75.67%  R
   9.25%  libc.so.6
   4.87%  [unknown]
   3.75%  libz.so.1.2.11
   3.37%  stats.so
   1.17%  libm.so.6
   0.63%  libtirpc.so.3.0.0
   0.41%  graphics.so
   0.30%  grDevices.so
   0.20%  libRblas.so
   0.09%  libpcre2-8.so.0.11.0
   0.07%  methods.so
   ...

which shows which shared libraries (DSOs) the time was spent in.

perf annotate can be used on an application built with GCC and -ggdb: it interleaves disassembled and source code.

3.4.3 oprofile and operf

The oprofile project has two modes of operation. Since version 0.9.8 (August 2012), the preferred mode is to use operf, so we discuss only that.

Let us look at the boot example from §3.2 on x86_64 Linux using R 4.3.1.

This can be run under operf and analysed by commands like

operf R -f boot.R
opreport
opreport -l /path/to/R_HOME/bin/exec/R
opreport -l /path/to/R_HOME/library/stats/src/stats.so
opannotate --source /path/to/R_HOME/bin/exec/R

The first line had to be done as root.

The first report shows in which library (etc) the time was spent:

   CPU_CLK_UNHALT...|
      samples|      %|
    ------------------
       278341 91.9947 R
        18290  6.0450 libc.so.6
         2277  0.7526 kallsyms
         1426  0.4713 stats.so
          739  0.2442 libRblas.so
          554  0.1831 libz.so.1.2.11
          373  0.1233 libm.so.6
          352  0.1163 libtirpc.so.3.0.0
          153  0.0506 ld-linux-x86-64.so.2
           12  0.0040 methods.so

(kallsyms is the kernel.)

The rest of the output is voluminous, and only extracts are shown.

Most of the time within R is spent in

samples  %        image name symbol name
52955    19.0574  R                        bcEval.lto_priv.0
16484     5.9322  R                        Rf_allocVector3
14224     5.1189  R                        Rf_findVarInFrame3
12581     4.5276  R                        CONS_NR
8289      2.9830  R                        Rf_matchArgs_NR
8034      2.8913  R                        Rf_cons
7114      2.5602  R                        R_gc_internal.lto_priv.0
6552      2.3579  R                        Rf_eval
5969      2.1481  R                        VECTOR_ELT
5684      2.0456  R                        Rf_applyClosure
5497      1.9783  R                        findVarLocInFrame.part.0.lto_priv.0
4827      1.7371  R                        Rf_mkPROMISE
4609      1.6587  R                        Rf_install
4317      1.5536  R                        Rf_findFun3
4035      1.4521  R                        getvar.lto_priv.0
3491      1.2563  R                        SETCAR
3179      1.1441  R                        Rf_defineVar
2892      1.0408  R                        duplicate1.lto_priv.0

and in stats.so

samples  %        image name               symbol name
285      24.4845  stats.so                 termsform
284      24.3986  stats.so                 numeric_deriv
213      18.2990  stats.so                 modelframe
114       9.7938  stats.so                 nls_iter
55        4.7251  stats.so                 ExtractVars
47        4.0378  stats.so                 EncodeVars
37        3.1787  stats.so                 getListElement
32        2.7491  stats.so                 TrimRepeats
25        2.1478  stats.so                 InstallVar
20        1.7182  stats.so                 MatchVar
20        1.7182  stats.so                 isZeroOne
15        1.2887  stats.so                 ConvInfoMsg.isra.0

The profiling data is by default stored in sub-directory oprofile_data of the current directory, which can be removed at the end of the session.

3.4.4 sprof

You can select shared objects to be profiled with sprof by setting the environment variable LD_PROFILE. For example

% setenv LD_PROFILE /path/to/R_HOME/library/stats/libs/stats.so
% R -f boot.R
% sprof /path/to/R_HOME/library/stats/libs/stats.so \
  /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 76.19      0.32     0.32        0     0.00           numeric_deriv
 16.67      0.39     0.07        0     0.00           nls_iter
  7.14      0.42     0.03        0     0.00           getListElement

... to clean up ...
rm /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile

It is possible that root access will be needed to create the directories used for the profile data.

3.4.5 Profiling on macOS

Developers have recommended Instruments (part of Xcode, see https://help.apple.com/instruments/mac/current/), This had a command-line version prior to macOS 12.

3.4.6 Profiling on Windows

Very Sleepy (https://github.com/VerySleepy/verysleepy) has been used on x86_64 Windows. There were problems with accessing the debug information, but the best results which included function names were obtained by attaching the profiler to an existing Rterm process, either via GUI or using /a: (PID obtained via Sys.getpid()).

Footnotes