3 Tidying and profiling R code
R code which is worth preserving in a package and perhaps making available for others to use is worth documenting, tidying up and perhaps optimizing. The last two of these activities are the subject of this chapter.
3.1 Tidying R code
R treats function code loaded from packages and code entered by users differently. By default code entered by users has the source code stored internally, and when the function is listed, the original source is reproduced. Loading code from a package (by default) discards the source code, and the function listing is re-created from the parse tree of the function.
Normally keeping the source code is a good idea, and in particular it avoids comments being removed from the source. However, we can make use of the ability to re-create a function listing from its parse tree to produce a tidy version of the function, for example with consistent indentation and spaces around operators. If the original source does not follow the standard format this tidied version can be much easier to read.
We can subvert the keeping of source in two ways.
- The option
keep.source
can be set toFALSE
before the code is loaded into R. - The stored source code can be removed by calling the
removeSource()
function, for example byr myfun <- removeSource(myfun)
In each case if we then list the function we will get the standard layout.
Suppose we have a file of functions myfuns.R
that we want to tidy up. Create a file tidy.R
containing
source("myfuns.R", keep.source = FALSE)
dump(ls(all.names = TRUE), file = "new.myfuns.R")
and run R with this as the source file, for example by R --vanilla < tidy.R or by pasting into an R session. Then the file new.myfuns.R
will contain the functions in alphabetical order in the standard layout. Warning: comments in your functions will be lost.
The standard format provides a good starting point for further tidying. Although the deparsing cannot do so, we recommend the consistent use of the preferred assignment operator <-
(rather than =
) for assignment. Many package authors use a version of Emacs (on a Unix-alike or Windows) to edit R code, using the ESS[S] mode of the ESS Emacs package. See R coding standards in R Internals for style options within the ESS[S] mode recommended for the source code of R itself.
3.2 Profiling R code for speed
It is possible to profile R code on Windows and most1 Unix-alike versions of R.
1 R has to be built to enable this, but the option --enable-R-profiling
is the default.
2 For Unix-alikes by default these are intervals of CPU time, and for Windows of elapsed (‘wall-clock’) time. As from R 4.4.0, elapsed time is optional on Unix-alikes
The command Rprof
is used to control profiling, and its help page can be consulted for full details. Profiling works by recording at fixed intervals2 (by default every 20 msecs) which line in which R function is being used, and recording the results in a file (default Rprof.out
in the working directory). Then the function summaryRprof
or the command-line utility R CMD Rprof Rprof.out
can be used to summarize the activity.
As an example, consider the following code (from Venables & Ripley, 2002, pp. 225–6).
library(MASS); library(boot)
<- nls(Time ~ b*Viscosity/(Wt - c), stormer,
storm.fm start = c(b=30.401, c=2.2183))
<- cbind(stormer, fit=fitted(storm.fm))
st <- function(rs, i) {
storm.bf $Time <- st$fit + rs[i]
st<- nls(Time ~ (b * Viscosity)/(Wt - c), st,
tmp start = coef(storm.fm))
$m$getAllPars()
tmp
}<- scale(resid(storm.fm), scale = FALSE) # remove the mean
rs Rprof("boot.out")
<- boot(rs, storm.bf, R = 4999) # slow enough to profile
storm.boot Rprof(NULL)
Having run this we can summarize the results by
R CMD Rprof boot.out
0.02 seconds.
Each sample represents : 22.52 seconds.
Total run time
: time spent in function and callees.
Total seconds: time spent in function alone.
Self seconds
% total % self
total seconds self seconds name100.0 25.22 0.2 0.04 "boot"
99.8 25.18 0.6 0.16 "statistic"
96.3 24.30 4.0 1.02 "nls"
33.9 8.56 2.2 0.56 "<Anonymous>"
32.4 8.18 1.4 0.36 "eval"
31.8 8.02 1.4 0.34 ".Call"
28.6 7.22 0.0 0.00 "eval.parent"
28.5 7.18 0.3 0.08 "model.frame"
28.1 7.10 3.5 0.88 "model.frame.default"
17.4 4.38 0.7 0.18 "sapply"
15.0 3.78 3.2 0.80 "nlsModel"
12.5 3.16 1.8 0.46 "lapply"
12.3 3.10 2.7 0.68 "assign"
...
% self % total
self seconds total seconds name5.7 1.44 7.5 1.88 "inherits"
4.0 1.02 96.3 24.30 "nls"
3.6 0.92 3.6 0.92 "$"
3.5 0.88 28.1 7.10 "model.frame.default"
3.2 0.80 15.0 3.78 "nlsModel"
2.8 0.70 9.8 2.46 "qr.coef"
2.7 0.68 12.3 3.10 "assign"
2.5 0.64 2.5 0.64 ".Fortran"
2.5 0.62 7.1 1.80 "qr.default"
2.2 0.56 33.9 8.56 "<Anonymous>"
2.1 0.54 5.9 1.48 "unlist"
2.1 0.52 7.9 2.00 "FUN"
...
This often produces surprising results and can be used to identify bottlenecks or pieces of R code that could benefit from being replaced by compiled code.
Two warnings: profiling does impose a small performance penalty, and the output files can be very large if long runs are profiled at the default sampling interval.
Profiling short runs can sometimes give misleading results. R from time to time performs garbage collection to reclaim unused memory, and this takes an appreciable amount of time which profiling will charge to whichever function happens to provoke it. It may be useful to compare profiling code immediately after a call to gc()
with a profiling run without a preceding call to gc
.
More detailed analysis of the output can be achieved by the tools in the CRAN packages proftools and profr: in particular these allow call graphs to be studied.
3.3 Profiling R code for memory use
Measuring memory use in R code is useful either when the code takes more memory than is conveniently available or when memory allocation and copying of objects is responsible for slow code. There are three ways to profile memory use over time in R code. The second and third require R to have been compiled with --enable-memory-profiling
, which is not the default, but is currently used for the macOS and Windows binary distributions. All can be misleading, for different reasons.
In understanding the memory profiles it is useful to know a little more about R’s memory allocation. Looking at the results of gc()
shows a division of memory into Vcells
used to store the contents of vectors and Ncells
used to store everything else, including all the administrative overhead for vectors such as type and length information. In fact the vector contents are divided into two pools. Memory for small vectors (by default 128 bytes or less) is obtained in large chunks and then parcelled out by R; memory for larger vectors is obtained directly from the operating system.
Some memory allocation is obvious in interpreted code, for example,
<- x + 1 y
allocates memory for a new vector y
. Other memory allocation is less obvious and occurs because R
is forced to make good on its promise of ‘call-by-value’ argument passing. When an argument is passed to a function it is not immediately copied. Copying occurs (if necessary) only when the argument is modified. This can lead to surprising memory use. For example, in the ‘survey’ package we have
.svycoxph <- function (x, ...)
print{
(x$survey.design, varnames = FALSE, design.summaries = FALSE, ...)
print<- x$printcall
x$call ()
NextMethod}
It may not be obvious that the assignment to x$call
will cause the entire object x
to be copied. This copying to preserve the call-by-value illusion is usually done by the internal C function duplicate
.
The main reason that memory-use profiling is difficult is garbage collection. Memory is allocated at well-defined times in an R program, but is freed whenever the garbage collector happens to run.
3.3.1 Memory statistics from Rprof
The sampling profiler Rprof
described in the previous section can be given the option memory.profiling=TRUE
. It then writes out the total R memory allocation in small vectors, large vectors, and cons cells or nodes at each sampling interval. It also writes out the number of calls to the internal function duplicate
, which is called to copy R objects. summaryRprof
provides summaries of this information. The main reason that this can be misleading is that the memory use is attributed to the function running at the end of the sampling interval. A second reason is that garbage collection can make the amount of memory in use decrease, so a function appears to use little memory. Running under gctorture
helps with both problems: it slows down the code to effectively increase the sampling frequency and it makes each garbage collection release a smaller amount of memory.
3.3.2 Tracking memory allocations
The second method of memory profiling uses a memory-allocation profiler, Rprofmem()
, which writes out a stack trace to an output file every time a large vector is allocated (with a user-specified threshold for ‘large’) or a new page of memory is allocated for the R heap. Summary functions for this output are still being designed.
Running the example from the previous section with
> Rprofmem("boot.memprof",threshold=1000)
> storm.boot <- boot(rs, storm.bf, R = 4999)
> Rprofmem(NULL)
shows that apart from some initial and final work in boot
there are no vector allocations over 1000 bytes.
3.3.3 Tracing copies of an object
The third method of memory profiling involves tracing copies made of a specific (presumably large) R object. Calling tracemem
on an object marks it so that a message is printed to standard output when the object is copied via duplicate
or coercion to another type, or when a new object of the same size is created in arithmetic operations. The main reason that this can be misleading is that copying of subsets or components of an object is not tracked. It may be helpful to use tracemem
on these components.
In the example above we can run tracemem
on the data frame st
> tracemem(st)
1] "<0x9abd5e0>"
[> storm.boot <- boot(rs, storm.bf, R = 4)
0x9abd5e0->0x92a6d08]: statistic boot
memtrace[0x92a6d08->0x92a6d80]: $<-.data.frame $<- statistic boot
memtrace[0x92a6d80->0x92a6df8]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x9271318]: statistic boot
memtrace[0x9271318->0x9271390]: $<-.data.frame $<- statistic boot
memtrace[0x9271390->0x9271408]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x914f558]: statistic boot
memtrace[0x914f558->0x914f5f8]: $<-.data.frame $<- statistic boot
memtrace[0x914f5f8->0x914f670]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x972cbf0]: statistic boot
memtrace[0x972cbf0->0x972cc68]: $<-.data.frame $<- statistic boot
memtrace[0x972cc68->0x972cd08]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x98ead98]: statistic boot
memtrace[0x98ead98->0x98eae10]: $<-.data.frame $<- statistic boot
memtrace[0x98eae10->0x98eae88]: $<-.data.frame $<- statistic boot memtrace[
The object is duplicated fifteen times, three times for each of the R+1
calls to storm.bf
. This is surprising, since none of the duplications happen inside nls
. Stepping through storm.bf
in the debugger shows that all three happen in the line
$Time <- st$fit + rs[i] st
Data frames are slower than matrices and this is an example of why. Using tracemem(st$Viscosity)
does not reveal any additional copying.
3.4 Profiling compiled code
Profiling compiled code is highly system-specific, but this section contains some hints gleaned from various R users. Some methods need to be different for a compiled executable and for dynamic/shared libraries/objects as used by R packages.
This chapter is based on reports from users and the information may not be current.
3.4.1 Profiling on Linux
Options include using sprof
for a shared object, and oprofile
(see https://oprofile.sourceforge.io/news/) and perf
(see https://perf.wiki.kernel.org/index.php/Tutorial) for any executable or shared object. These seem less widely supplied than they used to be. There is also ‘Google Performance Tools’, also known as gperftools or google-perftools.
All of these work best when R and any packages have been built with debugging symbols.
3.4.2 perf
This seems the most widely distributed tool. Here is an example on x86_64
Linux using R 4.3.1 built with LTO.
At its simplest
-f tests/Examples/stats-Ex.R
perf record R --sort=dso
perf report --sort=srcfile
perf report * rm perf.data
The first report is
75.67% R
9.25% libc.so.6
4.87% [unknown]
3.75% libz.so.1.2.11
3.37% stats.so
1.17% libm.so.6
0.63% libtirpc.so.3.0.0
0.41% graphics.so
0.30% grDevices.so
0.20% libRblas.so
0.09% libpcre2-8.so.0.11.0
0.07% methods.so
...
which shows which shared libraries (DSOs) the time was spent in.
perf annotate
can be used on an application built with GCC and -ggdb
: it interleaves disassembled and source code.
3.4.3 oprofile
and operf
The oprofile
project has two modes of operation. Since version 0.9.8 (August 2012), the preferred mode is to use operf
, so we discuss only that.
Let us look at the boot example from §3.2 on x86_64
Linux using R 4.3.1.
This can be run under operf
and analysed by commands like
-f boot.R
operf R
opreport-l /path/to/R_HOME/bin/exec/R
opreport -l /path/to/R_HOME/library/stats/src/stats.so
opreport --source /path/to/R_HOME/bin/exec/R opannotate
The first line had to be done as root.
The first report shows in which library (etc) the time was spent:
|
CPU_CLK_UNHALT...| %|
samples------------------
278341 91.9947 R
18290 6.0450 libc.so.6
2277 0.7526 kallsyms
1426 0.4713 stats.so
739 0.2442 libRblas.so
554 0.1831 libz.so.1.2.11
373 0.1233 libm.so.6
352 0.1163 libtirpc.so.3.0.0
153 0.0506 ld-linux-x86-64.so.2
12 0.0040 methods.so
(kallsyms
is the kernel.)
The rest of the output is voluminous, and only extracts are shown.
Most of the time within R is spent in
% image name symbol name
samples 52955 19.0574 R bcEval.lto_priv.0
16484 5.9322 R Rf_allocVector3
14224 5.1189 R Rf_findVarInFrame3
12581 4.5276 R CONS_NR
8289 2.9830 R Rf_matchArgs_NR
8034 2.8913 R Rf_cons
7114 2.5602 R R_gc_internal.lto_priv.0
6552 2.3579 R Rf_eval
5969 2.1481 R VECTOR_ELT
5684 2.0456 R Rf_applyClosure
5497 1.9783 R findVarLocInFrame.part.0.lto_priv.0
4827 1.7371 R Rf_mkPROMISE
4609 1.6587 R Rf_install
4317 1.5536 R Rf_findFun3
4035 1.4521 R getvar.lto_priv.0
3491 1.2563 R SETCAR
3179 1.1441 R Rf_defineVar
2892 1.0408 R duplicate1.lto_priv.0
and in stats.so
samples % image name symbol name285 24.4845 stats.so termsform
284 24.3986 stats.so numeric_deriv
213 18.2990 stats.so modelframe
114 9.7938 stats.so nls_iter
55 4.7251 stats.so ExtractVars
47 4.0378 stats.so EncodeVars
37 3.1787 stats.so getListElement
32 2.7491 stats.so TrimRepeats
25 2.1478 stats.so InstallVar
20 1.7182 stats.so MatchVar
20 1.7182 stats.so isZeroOne
15 1.2887 stats.so ConvInfoMsg.isra.0
The profiling data is by default stored in sub-directory oprofile_data
of the current directory, which can be removed at the end of the session.
3.4.4 sprof
You can select shared objects to be profiled with sprof
by setting the environment variable LD_PROFILE
. For example
/path/to/R_HOME/library/stats/libs/stats.so
% setenv LD_PROFILE -f boot.R
% R /path/to/R_HOME/library/stats/libs/stats.so \
% sprof /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile
:
Flat profile
0.01 seconds.
Each sample counts as
% cumulative self self total/call us/call name
time seconds seconds calls us76.19 0.32 0.32 0 0.00 numeric_deriv
16.67 0.39 0.07 0 0.00 nls_iter
7.14 0.42 0.03 0 0.00 getListElement
... to clean up .../var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile rm
It is possible that root access will be needed to create the directories used for the profile data.
3.4.5 Profiling on macOS
Developers have recommended Instruments
(part of Xcode
, see https://help.apple.com/instruments/mac/current/), This had a command-line version prior to macOS 12.
3.4.6 Profiling on Windows
Very Sleepy
(https://github.com/VerySleepy/verysleepy) has been used on x86_64
Windows. There were problems with accessing the debug information, but the best results which included function names were obtained by attaching the profiler to an existing Rterm
process, either via GUI or using /a:
(PID obtained via Sys.getpid()
).
Footnotes