3 Tidying and profiling R code
Next: Debugging, Previous: Writing R documentation files, Up: Writing R Extensions [Contents][Index]
R code which is worth preserving in a package and perhaps making available for others to use is worth documenting, tidying up and perhaps optimizing. The last two of these activities are the subject of this chapter.
3.1 Tidying R code
R treats function code loaded from packages and code entered by users differently. By default code entered by users has the source code stored internally, and when the function is listed, the original source is reproduced. Loading code from a package (by default) discards the source code, and the function listing is re-created from the parse tree of the function.
Normally keeping the source code is a good idea, and in particular it avoids comments being removed from the source. However, we can make use of the ability to re-create a function listing from its parse tree to produce a tidy version of the function, for example with consistent indentation and spaces around operators. If the original source does not follow the standard format this tidied version can be much easier to read.
We can subvert the keeping of source in two ways.
- The option
keep.sourcecan be set toFALSEbefore the code is loaded into R. - The stored source code can be removed by calling the
removeSource()function, for example byr myfun <- removeSource(myfun)
In each case if we then list the function we will get the standard layout.
Suppose we have a file of functions myfuns.R that we want to tidy up. Create a file tidy.R containing
source("myfuns.R", keep.source = FALSE)
dump(ls(all.names = TRUE), file = "new.myfuns.R")and run R with this as the source file, for example by R --vanilla < tidy.R or by pasting into an R session. Then the file new.myfuns.R will contain the functions in alphabetical order in the standard layout. Warning: comments in your functions will be lost.
The standard format provides a good starting point for further tidying. Although the deparsing cannot do so, we recommend the consistent use of the preferred assignment operator <-’ (rather than =’) for assignment. Many package authors use a version of Emacs (on a Unix-alike or Windows) to edit R code, using the ESS[S] mode of the ESS Emacs package. See R coding standards in R Internals for style options within the ESS[S] mode recommended for the source code of R itself.
3.2 Profiling R code for speed
It is possible to profile R code on Windows and most127 Unix-alike versions of R.
The command Rprof is used to control profiling, and its help page can be consulted for full details. Profiling works by recording at fixed intervals128 (by default every 20 msecs) which line in which R function is being used, and recording the results in a file (default Rprof.out in the working directory). Then the function summaryRprof or the command-line utility R CMD RprofRprof.out can be used to summarize the activity.
As an example, consider the following code (from Venables & Ripley, 2002, pp. 225–6).
library(MASS); library(boot)
storm.fm <- nls(Time ~ b*Viscosity/(Wt - c), stormer,
start = c(b=30.401, c=2.2183))
st <- cbind(stormer, fit=fitted(storm.fm))
storm.bf <- function(rs, i) {
st$Time <- st$fit + rs[i]
tmp <- nls(Time ~ (b * Viscosity)/(Wt - c), st,
start = coef(storm.fm))
tmp$m$getAllPars()
}
rs <- scale(resid(storm.fm), scale = FALSE) # remove the mean
Rprof("boot.out")
storm.boot <- boot(rs, storm.bf, R = 4999) # slow enough to profile
Rprof(NULL)Having run this we can summarize the results by
R CMD Rprof boot.out
Each sample represents 0.02 seconds.
Total run time: 22.52 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
% total % self
total seconds self seconds name
100.0 25.22 0.2 0.04 "boot"
99.8 25.18 0.6 0.16 "statistic"
96.3 24.30 4.0 1.02 "nls"
33.9 8.56 2.2 0.56 "<Anonymous>"
32.4 8.18 1.4 0.36 "eval"
31.8 8.02 1.4 0.34 ".Call"
28.6 7.22 0.0 0.00 "eval.parent"
28.5 7.18 0.3 0.08 "model.frame"
28.1 7.10 3.5 0.88 "model.frame.default"
17.4 4.38 0.7 0.18 "sapply"
15.0 3.78 3.2 0.80 "nlsModel"
12.5 3.16 1.8 0.46 "lapply"
12.3 3.10 2.7 0.68 "assign"
...
% self % total
self seconds total seconds name
5.7 1.44 7.5 1.88 "inherits"
4.0 1.02 96.3 24.30 "nls"
3.6 0.92 3.6 0.92 "$"
3.5 0.88 28.1 7.10 "model.frame.default"
3.2 0.80 15.0 3.78 "nlsModel"
2.8 0.70 9.8 2.46 "qr.coef"
2.7 0.68 12.3 3.10 "assign"
2.5 0.64 2.5 0.64 ".Fortran"
2.5 0.62 7.1 1.80 "qr.default"
2.2 0.56 33.9 8.56 "<Anonymous>"
2.1 0.54 5.9 1.48 "unlist"
2.1 0.52 7.9 2.00 "FUN"
...This often produces surprising results and can be used to identify bottlenecks or pieces of R code that could benefit from being replaced by compiled code.
Two warnings: profiling does impose a small performance penalty, and the output files can be very large if long runs are profiled at the default sampling interval.
Profiling short runs can sometimes give misleading results. R from time to time performs garbage collection to reclaim unused memory, and this takes an appreciable amount of time which profiling will charge to whichever function happens to provoke it. It may be useful to compare profiling code immediately after a call to gc() with a profiling run without a preceding call to gc.
More detailed analysis of the output can be achieved by the tools in the CRAN packages proftools and profr: in particular these allow call graphs to be studied.
3.3 Profiling R code for memory use
Measuring memory use in R code is useful either when the code takes more memory than is conveniently available or when memory allocation and copying of objects is responsible for slow code. There are three ways to profile memory use over time in R code. The second and third require R to have been compiled with --enable-memory-profiling, which is not the default, but is currently used for the macOS and Windows binary distributions. All can be misleading, for different reasons.
In understanding the memory profiles it is useful to know a little more about R’s memory allocation. Looking at the results of gc() shows a division of memory into Vcells used to store the contents of vectors and Ncells used to store everything else, including all the administrative overhead for vectors such as type and length information. In fact the vector contents are divided into two pools. Memory for small vectors (by default 128 bytes or less) is obtained in large chunks and then parcelled out by R; memory for larger vectors is obtained directly from the operating system.
Some memory allocation is obvious in interpreted code, for example,
y <- x + 1allocates memory for a new vector y. Other memory allocation is less obvious and occurs because R is forced to make good on its promise of ‘call-by-value’ argument passing. When an argument is passed to a function it is not immediately copied. Copying occurs (if necessary) only when the argument is modified. This can lead to surprising memory use. For example, in the ‘survey’ package we have
print.svycoxph <- function (x, ...)
{
print(x$survey.design, varnames = FALSE, design.summaries = FALSE, ...)
x$call <- x$printcall
NextMethod()
}It may not be obvious that the assignment to x$call will cause the entire object x to be copied. This copying to preserve the call-by-value illusion is usually done by the internal C function Rf_duplicate.
The main reason that memory-use profiling is difficult is garbage collection. Memory is allocated at well-defined times in an R program, but is freed whenever the garbage collector happens to run.
3.3.1 Memory statistics from Rprof
The sampling profiler Rprof described in the previous section can be given the option memory.profiling=TRUE. It then writes out the total R memory allocation in small vectors, large vectors, and cons cells or nodes at each sampling interval. It also writes out the number of calls to the internal function Rf_duplicate, which is called to copy R objects. summaryRprof provides summaries of this information. The main reason that this can be misleading is that the memory use is attributed to the function running at the end of the sampling interval. A second reason is that garbage collection can make the amount of memory in use decrease, so a function appears to use little memory. Running under gctorture helps with both problems: it slows down the code to effectively increase the sampling frequency and it makes each garbage collection release a smaller amount of memory.
3.3.2 Tracking memory allocations
The second method of memory profiling uses a memory-allocation profiler, Rprofmem(), which writes out a stack trace to an output file every time a large vector is allocated (with a user-specified threshold for ‘large’) or a new page of memory is allocated for the R heap. Summary functions for this output are still being designed.
Running the example from the previous section with
> Rprofmem("boot.memprof",threshold=1000)
> storm.boot <- boot(rs, storm.bf, R = 4999)
> Rprofmem(NULL)shows that apart from some initial and final work in boot there are no vector allocations over 1000 bytes.
3.3.3 Tracing copies of an object
The third method of memory profiling involves tracing copies made of a specific (presumably large) R object. Calling tracemem on an object marks it so that a message is printed to standard output when the object is copied via Rf_duplicate or coercion to another type, or when a new object of the same size is created in arithmetic operations. The main reason that this can be misleading is that copying of subsets or components of an object is not tracked. It may be helpful to use tracemem on these components.
In the example above we can run tracemem on the data frame st
> tracemem(st)
[1] "<0x9abd5e0>"
> storm.boot <- boot(rs, storm.bf, R = 4)
memtrace[0x9abd5e0->0x92a6d08]: statistic boot
memtrace[0x92a6d08->0x92a6d80]: $<-.data.frame $<- statistic boot
memtrace[0x92a6d80->0x92a6df8]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x9271318]: statistic boot
memtrace[0x9271318->0x9271390]: $<-.data.frame $<- statistic boot
memtrace[0x9271390->0x9271408]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x914f558]: statistic boot
memtrace[0x914f558->0x914f5f8]: $<-.data.frame $<- statistic boot
memtrace[0x914f5f8->0x914f670]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x972cbf0]: statistic boot
memtrace[0x972cbf0->0x972cc68]: $<-.data.frame $<- statistic boot
memtrace[0x972cc68->0x972cd08]: $<-.data.frame $<- statistic boot
memtrace[0x9abd5e0->0x98ead98]: statistic boot
memtrace[0x98ead98->0x98eae10]: $<-.data.frame $<- statistic boot
memtrace[0x98eae10->0x98eae88]: $<-.data.frame $<- statistic bootThe object is duplicated fifteen times, three times for each of the R+1 calls to storm.bf. This is surprising, since none of the duplications happen inside nls. Stepping through storm.bf in the debugger shows that all three happen in the line
st$Time <- st$fit + rs[i]Data frames are slower than matrices and this is an example of why. Using tracemem(st$Viscosity) does not reveal any additional copying.
3.4 Profiling compiled code
Profiling compiled code is highly system-specific, but this section contains some hints gleaned from various R users. Some methods need to be different for a compiled executable and for dynamic/shared libraries/objects as used by R packages.
This chapter is based on reports from users and the information may not be current.
3.4.1 Profiling on Linux
Options include using sprof for a shared object, and oprofile (see https://oprofile.sourceforge.io/news/) and perf (see https://perfwiki.github.io/) for any executable or shared object. These seem less widely supplied than they used to be. There is also ‘Google Performance Tools’, also known as gperftools or google-perftools.
All of these work best when R and any packages have been built with debugging symbols.
3.4.2 perf
This seems the most widely distributed tool. Here is an example on x86_64 Linux using R 4.3.1 built with LTO.
At its simplest
perf record R -f tests/Examples/stats-Ex.R
perf report --sort=dso
perf report --sort=srcfile
rm perf.data*The first report is
75.67% R
9.25% libc.so.6
4.87% [unknown]
3.75% libz.so.1.2.11
3.37% stats.so
1.17% libm.so.6
0.63% libtirpc.so.3.0.0
0.41% graphics.so
0.30% grDevices.so
0.20% libRblas.so
0.09% libpcre2-8.so.0.11.0
0.07% methods.so
...which shows which shared libraries (DSOs) the time was spent in.
perf annotate can be used on an application built with GCC and -ggdb: it interleaves disassembled and source code.
3.4.3 oprofile and operf
The oprofile project has two modes of operation. Since version 0.9.8 (August 2012), the preferred mode is to use operf, so we discuss only that.
Let us look at the boot example from §3.2 on x86_64 Linux using R 4.3.1.
This can be run under operf and analysed by commands like
operf R -f boot.R
opreport
opreport -l /path/to/R_HOME/bin/exec/R
opreport -l /path/to/R_HOME/library/stats/src/stats.so
opannotate --source /path/to/R_HOME/bin/exec/RThe first line had to be done as root.
The first report shows in which library (etc) the time was spent:
CPU_CLK_UNHALT...|
samples| %|
------------------
278341 91.9947 R
18290 6.0450 libc.so.6
2277 0.7526 kallsyms
1426 0.4713 stats.so
739 0.2442 libRblas.so
554 0.1831 libz.so.1.2.11
373 0.1233 libm.so.6
352 0.1163 libtirpc.so.3.0.0
153 0.0506 ld-linux-x86-64.so.2
12 0.0040 methods.so(kallsyms is the kernel.)
The rest of the output is voluminous, and only extracts are shown.
Most of the time within R is spent in
samples % image name symbol name
52955 19.0574 R bcEval.lto_priv.0
16484 5.9322 R Rf_allocVector3
14224 5.1189 R Rf_findVarInFrame3
12581 4.5276 R CONS_NR
8289 2.9830 R Rf_matchArgs_NR
8034 2.8913 R Rf_cons
7114 2.5602 R R_gc_internal.lto_priv.0
6552 2.3579 R Rf_eval
5969 2.1481 R VECTOR_ELT
5684 2.0456 R Rf_applyClosure
5497 1.9783 R findVarLocInFrame.part.0.lto_priv.0
4827 1.7371 R Rf_mkPROMISE
4609 1.6587 R Rf_install
4317 1.5536 R Rf_findFun3
4035 1.4521 R getvar.lto_priv.0
3491 1.2563 R SETCAR
3179 1.1441 R Rf_defineVar
2892 1.0408 R duplicate1.lto_priv.0and in stats.so
samples % image name symbol name
285 24.4845 stats.so termsform
284 24.3986 stats.so numeric_deriv
213 18.2990 stats.so modelframe
114 9.7938 stats.so nls_iter
55 4.7251 stats.so ExtractVars
47 4.0378 stats.so EncodeVars
37 3.1787 stats.so getListElement
32 2.7491 stats.so TrimRepeats
25 2.1478 stats.so InstallVar
20 1.7182 stats.so MatchVar
20 1.7182 stats.so isZeroOne
15 1.2887 stats.so ConvInfoMsg.isra.0The profiling data is by default stored in sub-directory oprofile_data of the current directory, which can be removed at the end of the session.
3.4.4 sprof
You can select shared objects to be profiled with sprof by setting the environment variable LD_PROFILE. For example
% setenv LD_PROFILE /path/to/R_HOME/library/stats/libs/stats.so
% R -f boot.R
% sprof /path/to/R_HOME/library/stats/libs/stats.so \
/var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
76.19 0.32 0.32 0 0.00 numeric_deriv
16.67 0.39 0.07 0 0.00 nls_iter
7.14 0.42 0.03 0 0.00 getListElement
... to clean up ...
rm /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profileIt is possible that root access will be needed to create the directories used for the profile data.
3.4.5 Profiling on macOS
Developers have recommended Instruments (part of Xcode, see https://help.apple.com/instruments/mac/current/), This had a command-line version prior to macOS 12.
3.4.6 Profiling on Windows
Very Sleepy (https://github.com/VerySleepy/verysleepy) has been used on x86_64’ Windows. There were problems with accessing the debug information, but the best results which included function names were obtained by attaching the profiler to an existing Rterm process, either via GUI or using /a: (PID obtained via Sys.getpid()).