4 Debugging ¶
Next: System and foreign language interfaces, Previous: Tidying and profiling R code, Up: Writing R Extensions [Contents][Index]
This chapter covers the debugging of R extensions, starting with the ways to get useful error information and moving on to how to deal with errors that crash R.
4.1 Browsing ¶
Most of the R-level debugging facilities are based around the built-in browser. This can be used directly by inserting a call to browser()
into the code of a function (for example, using fix(my_function)
). When code execution reaches that point in the function, control returns to the R console with a special prompt. For example
> fix(summary.data.frame) ## insert browser() call after for() loop
> summary(women)
: summary.data.frame(women)
Called from1]> ls()
Browse[1] "digits" "i" "lbs" "lw" "maxsum" "ncw" "nm" "nr"
[9] "nv" "object" "sms" "z"
[1]> maxsum
Browse[1] 7
[1]> c
Browse[
height weight:58.0 Min. :115.0
Min. 1st Qu.:61.5 1st Qu.:124.5
:65.0 Median :135.0
Median :65.0 Mean :136.7
Mean 3rd Qu.:68.5 3rd Qu.:148.0
:72.0 Max. :164.0
Max. > rm(summary.data.frame)
At the browser prompt one can enter any R expression, so for example ls()
lists the objects in the current frame, and entering the name of an object will128 print it. The following commands are also accepted
n
Enter ‘step-through’ mode. In this mode, hitting the return key (RET) executes the next line of code (more precisely one line and any continuation lines). Typing
c
will continue to the end of the current context, e.g. to the end of the current loop or function.c
In normal mode, this quits the browser and continues execution, and just return works in the same way.
cont
is a synonym.where
This prints the call stack. For example
> summary(women) : summary.data.frame(women) Called from1]> where Browse[1: summary.data.frame(women) where 2: summary(women) where 1]> Browse[
Q
Quit both the browser and the current expression, and return to the top-level prompt.
Errors in code executed at the browser prompt will normally return control to the browser prompt. Objects can be altered by assignment, and will keep their changed values when the browser is exited. If really necessary, objects can be assigned to the workspace from the browser prompt (by using <<-
if the name is not already in scope).
4.2 Debugging R code ¶
Suppose your R program gives an error message. The first thing to find out is what R was doing at the time of the error, and the most useful tool is traceback()
. We suggest that this is run whenever the cause of the error is not immediately obvious. Errors are often reported to the R mailing lists as being in some package when traceback()
would show that the error was being reported by some other package or base R. Here is an example from the regression suite.
> success <- c(13,12,11,14,14,11,13,11,12)
> failure <- c(0,0,0,0,0,0,0,2,2)
> resp <- cbind(success, failure)
> predictor <- c(0, 5^(0:7))
> glm(resp ~ 0+predictor, family = binomial(link="log"))
: no valid set of coefficients has been found: please supply starting values
Error> traceback()
3: stop("no valid set of coefficients has been found: please supply
", call. = FALSE)
starting values2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,
= mustart, offset = offset, family = family, control = control,
mustart = attr(mt, "intercept") > 0)
intercept 1: glm(resp ~ 0 + predictor, family = binomial(link ="log"))
The calls to the active frames are given in reverse order (starting with the innermost). So we see the error message comes from an explicit check in glm.fit
. (traceback()
shows you all the lines of the function calls, which can be limited by setting option
"deparse.max.lines".)
Sometimes the traceback will indicate that the error was detected inside compiled code, for example (from ?nls
)
in nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321), trace = TRUE) :
Error 0.000488281 reduced below 'minFactor' of 0.000976563
step factor > traceback()
2: .Call(R_nls_iter, m, ctrl, trace)
1: nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321), trace = TRUE)
This will be the case if the innermost call is to .C
, .Fortran
, .Call
, .External
or .Internal
, but as it is also possible for such code to evaluate R expressions, this need not be the innermost call, as in
> traceback()
9: gm(a, b, x)
8: .Call(R_numeric_deriv, expr, theta, rho, dir)
7: numericDeriv(form[[3]], names(ind), env)
6: getRHS()
5: assign("rhs", getRHS(), envir = thisEnv)
4: assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)),
envir = thisEnv)
3: function (newPars)
{setPars(newPars)
assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)),
envir = thisEnv)
assign("dev", sum(resid^2), envir = thisEnv)
assign("QR", qr(.swts * attr(rhs, "gradient")), envir = thisEnv)
return(QR$rank < min(dim(QR$qr)))
c(-0.00760232418963883, 1.00119632515036))
}(2: .Call(R_nls_iter, m, ctrl, trace)
1: nls(yeps ~ gm(a, b, x), start = list(a = 0.12345, b = 0.54321))
Occasionally traceback()
does not help, and this can be the case if S4 method dispatch is involved. Consider the following example
> xyd <- new("xyloc", x=runif(20), y=runif(20))
.environment(pkg) : no item called "package:S4nswv"
Error in as
on the search list(value, ...) : S language method selection got
Error in initializefor function 'initialize'
an error when called from internal dispatch > traceback()
2: initialize(value, ...)
1: new("xyloc", x = runif(20), y = runif(20))
which does not help much, as there is no call to as.environment
in initialize
(and the note “called from internal dispatch” tells us so). In this case we searched the R sources for the quoted call, which occurred in only one place, methods:::.asEnvironmentPackage
. So now we knew where the error was occurring. (This was an unusually opaque example.)
The error message
: infinite recursion / options(expressions=)? evaluation nested too deeply
can be hard to handle with the default value (5000). Unless you know that there actually is deep recursion going on, it can help to set something like
options(expressions=500)
and re-run the example showing the error.
Sometimes there is warning that clearly is the precursor to some later error, but it is not obvious where it is coming from. Setting options(warn = 2)
(which turns warnings into errors) can help here.
Once we have located the error, we have some choices. One way to proceed is to find out more about what was happening at the time of the crash by looking a post-mortem dump. To do so, set options(error=dump.frames)
and run the code again. Then invoke debugger()
and explore the dump. Continuing our example:
> options(error = dump.frames)
> glm(resp ~ 0 + predictor, family = binomial(link ="log"))
: no valid set of coefficients has been found: please supply starting values Error
which is the same as before, but an object called last.dump
has appeared in the workspace. (Such objects can be large, so remove it when it is no longer needed.) We can examine this at a later time by calling the function debugger
.
> debugger()
: Error: no valid set of coefficients has been found: please supply starting values
Message:
Available environments had calls1: glm(resp ~ 0 + predictor, family = binomial(link = "log"))
2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, mus
3: stop("no valid set of coefficients has been found: please supply starting values
Enter an environment number, or 0 to exit Selection:
which gives the same sequence of calls as traceback
, but in outer-first order and with only the first line of the call, truncated to the current width. However, we can now examine in more detail what was happening at the time of the error. Selecting an environment opens the browser in that frame. So we select the function call which spawned the error message, and explore some of the variables (and execute two function calls).
, or 0 to exit Selection: 2
Enter an environment number:
Browsing in the environment with call.fit(x = X, y = Y, weights = weights, start = start, etas
glm: debugger.look(ind)
Called from[1]> ls()
Browse[1] "aic" "boundary" "coefold" "control" "conv"
[6] "dev" "dev.resids" "devold" "EMPTY" "eta"
[11] "etastart" "family" "fit" "good" "intercept"
[16] "iter" "linkinv" "mu" "mu.eta" "mu.eta.val"
[21] "mustart" "n" "ngoodobs" "nobs" "nvars"
[26] "offset" "start" "valideta" "validmu" "variance"
[31] "varmu" "w" "weights" "x" "xnames"
[36] "y" "ynames" "z"
[1]> eta
Browse1 2 3 4 5
0.000000e+00 -2.235357e-06 -1.117679e-05 -5.588393e-05 -2.794197e-04
6 7 8 9
-1.397098e-03 -6.985492e-03 -3.492746e-02 -1.746373e-01
[1]> valideta(eta)
Browse[1] TRUE
[1]> mu
Browse1 2 3 4 5 6 7 8
1.0000000 0.9999978 0.9999888 0.9999441 0.9997206 0.9986039 0.9930389 0.9656755
9
0.8397616
[1]> validmu(mu)
Browse[1] FALSE
[1]> c
Browse:
Available environments had calls1: glm(resp ~ 0 + predictor, family = binomial(link = "log"))
2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart
3: stop("no valid set of coefficients has been found: please supply starting v
, or 0 to exit Selection: 0
Enter an environment number> rm(last.dump)
Because last.dump
can be looked at later or even in another R session, post-mortem debugging is possible even for batch usage of R. We do need to arrange for the dump to be saved: this can be done either using the command-line flag --save to save the workspace at the end of the run, or via a setting such as
> options(error = quote({dump.frames(to.file=TRUE); q()}))
See the help on dump.frames
for further options and a worked example.
An alternative error action is to use the function recover()
:
> options(error = recover)
> glm(resp ~ 0 + predictor, family = binomial(link = "log"))
: no valid set of coefficients has been found: please supply starting values
Error
0 to exit
Enter a frame number, or
1: glm(resp ~ 0 + predictor, family = binomial(link = "log"))
2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart
: Selection
which is very similar to dump.frames
. However, we can examine the state of the program directly, without dumping and re-loading the dump. As its help page says, recover
can be routinely used as the error action in place of dump.calls
and dump.frames
, since it behaves like dump.frames
in non-interactive use.
Post-mortem debugging is good for finding out exactly what went wrong, but not necessarily why. An alternative approach is to take a closer look at what was happening just before the error, and a good way to do that is to use debug
. This inserts a call to the browser at the beginning of the function, starting in step-through mode. So in our example we could use
> debug(glm.fit)
> glm(resp ~ 0 + predictor, family = binomial(link ="log"))
: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,
debugging in= mustart, offset = offset, family = family, control = control,
mustart = attr(mt, "intercept") > 0)
intercept : {
debug## lists the whole function
[1]>
Browse: x <- as.matrix(x)
debug...
[1]> start
Browse[1] -2.235357e-06
: eta <- drop(x %*% start)
debug[1]> eta
Browse1 2 3 4 5
0.000000e+00 -2.235357e-06 -1.117679e-05 -5.588393e-05 -2.794197e-04
6 7 8 9
-1.397098e-03 -6.985492e-03 -3.492746e-02 -1.746373e-01
[1]>
Browse: mu <- linkinv(eta <- eta + offset)
debug[1]> mu
Browse1 2 3 4 5 6 7 8
1.0000000 0.9999978 0.9999888 0.9999441 0.9997206 0.9986039 0.9930389 0.9656755
9
0.8397616
(The prompt Browse[1]>
indicates that this is the first level of browsing: it is possible to step into another function that is itself being debugged or contains a call to browser()
.)
debug
can be used for hidden functions and S3 methods by e.g. debug(stats:::predict.Arima)
. (It cannot be used for S4 methods, but an alternative is given on the help page for debug
.) Sometimes you want to debug a function defined inside another function, e.g. the function arimafn
defined inside arima
. To do so, set debug
on the outer function (here arima
) and step through it until the inner function has been defined. Then call debug
on the inner function (and use c
to get out of step-through mode in the outer function).
To remove debugging of a function, call undebug
with the argument previously given to debug
; debugging otherwise lasts for the rest of the R session (or until the function is edited or otherwise replaced).
trace
can be used to temporarily insert debugging code into a function, for example to insert a call to browser()
just before the point of the error. To return to our running example
## first get a numbered listing of the expressions of the function
> page(as.list(body(glm.fit)), method="print")
> trace(glm.fit, browser, at=22)
"glm.fit" in package "stats"
Tracing function [1] "glm.fit"
> glm(resp ~ 0 + predictor, family = binomial(link ="log"))
.fit(x = X, y = Y, weights = weights, start = start,
Tracing glm= etastart, .... step 22
etastart : eval(expr, envir, enclos)
Called from[1]> n
Browse## and single-step from here.
> untrace(glm.fit)
For your own functions, it may be as easy to use fix
to insert temporary code, but trace
can help with functions in a namespace (as can fixInNamespace
). Alternatively, use trace(,edit=TRUE)
to insert code visually.
4.3 Checking memory access ¶
Errors in memory allocation and reading/writing outside arrays are very common causes of crashes (e.g., segfaults) on some machines. Often the crash appears long after the invalid memory access: in particular damage to the structures which R itself has allocated may only become apparent at the next garbage collection (or even at later garbage collections after objects have been deleted).
Note that memory access errors may be seen with LAPACK, BLAS, OpenMP and Java-using packages: some at least of these seem to be intentional, and some are related to passing characters to Fortran.
Some of these tools can detect mismatched allocation and deallocation. C++ programmers should note that memory allocated by new []
must be freed by delete []
, other uses of new
by delete
, and memory allocated by malloc
, calloc
and realloc
by free
. Some platforms will tolerate mismatches (perhaps with memory leaks) but others will segfault.
4.3.1 Using gctorture
¶
We can help to detect memory problems in R objects earlier by running garbage collection as often as possible. This is achieved by gctorture(TRUE)
, which as described on its help page
Provokes garbage collection on (nearly) every memory allocation. Intended to ferret out memory protection bugs. Also makes R run very slowly, unfortunately.
The reference to ‘memory protection’ is to missing C-level calls to PROTECT
/UNPROTECT
(see Handling the effects of garbage collection) which if missing allow R objects to be garbage-collected when they are still in use. But it can also help with other memory-related errors.
Normally running under gctorture(TRUE)
will just produce a crash earlier in the R program, hopefully close to the actual cause. See the next section for how to decipher such crashes.
It is possible to run all the examples, tests and vignettes covered by R CMD check
under gctorture(TRUE)
by using the option --use-gct.
The function gctorture2
provides more refined control over the GC torture process. Its arguments step
, wait
and inhibit_release
are documented on its help page. Environment variables can also be used at the start of the R session to turn on GC torture: R_GCTORTURE
corresponds to the step
argument to gctorture2
, R_GCTORTURE_WAIT
to wait
, and R_GCTORTURE_INHIBIT_RELEASE
to inhibit_release
.
If R is configured with --enable-strict-barrier then a variety of tests for the integrity of the write barrier are enabled. In addition tests to help detect protect issues are enabled:
- All GCs are full GCs.
- New nodes in small node pages are marked as
NEWSXP
on creation. - After a GC all free nodes that are not of type
NEWSXP
are marked as typeFREESXP
and their previous type is recorded. - Most calls to accessor functions check their
SEXP
inputs andSEXP
outputs and signal an error if aFREESXP
is found. The address of the node and the old type are included in the error message.
R CMD check --use-gct
can be set to use gctorture2(
n)
rather than gctorture(TRUE)
by setting environment variable _R_CHECK_GCT_N_
to a positive integer value to be used as n.
Used with a debugger and with gctorture
or gctorture2
this mechanism can be helpful in isolating memory protect problems.
4.3.2 Using Valgrind ¶
If you have access to Linux on a common CPU type or supported versions of FreeBSD or Solaris129 you can use valgrind
(https://valgrind.org/, pronounced to rhyme with ‘tinned’) to check for possible problems. To run some examples under valgrind
use something like
-d valgrind --vanilla < mypkg-Ex.R
R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < mypkg-Ex.R R
where mypkg-Ex.R is a set of examples, e.g. the file created in mypkg.Rcheck by R CMD check
. Occasionally this reports memory reads of ‘uninitialised values’ that are the result of compiler optimization, so can be worth checking under an unoptimized compile: for maximal information use a build with debugging symbols. We know there will be some small memory leaks from readline
and R itself — these are memory areas that are in use right up to the end of the R session. Expect this to run around 20x slower than without valgrind
, and in some cases much slower than that. Several versions of valgrind
were not happy with some optimized BLAS libraries that use CPU-specific instructions so you may need to build a version of R specifically to use with valgrind
.
On platforms where valgrind
and its headers130 are installed you can build a version of R with extra instrumentation to help valgrind
detect errors in the use of memory allocated from the R heap. The configure
option is --with-valgrind-instrumentation=level, where level is 0, 1 or 2. Level 0 is the default and does not add anything. Level 1 will detect some uses131 of uninitialised memory and has little impact on speed (compared to level 0). Level 2 will detect many other memory-use bugs132 but make R much slower when running under valgrind
. Using this in conjunction with gctorture
can be even more effective (and even slower).
An example of valgrind
output is
==12539== Invalid read of size 4
==12539== at 0x1CDF6CBE: csc_compTr (Mutils.c:273)
==12539== by 0x1CE07E1E: tsc_transpose (dtCMatrix.c:25)
==12539== by 0x80A67A7: do_dotcall (dotcode.c:858)
==12539== by 0x80CACE2: Rf_eval (eval.c:400)
==12539== by 0x80CB5AF: R_execClosure (eval.c:658)
==12539== by 0x80CB98E: R_execMethod (eval.c:760)
==12539== by 0x1B93DEFA: R_standardGeneric (methods_list_dispatch.c:624)
==12539== by 0x810262E: do_standardGeneric (objects.c:1012)
==12539== by 0x80CAD23: Rf_eval (eval.c:403)
==12539== by 0x80CB2F0: Rf_applyClosure (eval.c:573)
==12539== by 0x80CADCC: Rf_eval (eval.c:414)
==12539== by 0x80CAA03: Rf_eval (eval.c:362)
==12539== Address 0x1C0D2EA8 is 280 bytes inside a block of size 1996 alloc'd
==12539== at 0x1B9008D1: malloc (vg_replace_malloc.c:149)
==12539== by 0x80F1B34: GetNewPage (memory.c:610)
==12539== by 0x80F7515: Rf_allocVector (memory.c:1915)
...
This example is from an instrumented version of R, while tracking down a bug in the Matrix package in 2006. The first line indicates that R has tried to read 4 bytes from a memory address that it does not have access to. This is followed by a C stack trace showing where the error occurred. Next is a description of the memory that was accessed. It is inside a block allocated by malloc
, called from GetNewPage
, that is, in the internal R heap. Since this memory all belongs to R, valgrind
would not (and did not) detect the problem in an uninstrumented build of R. In this example the stack trace was enough to isolate and fix the bug, which was in tsc_transpose
, and in this example running under gctorture()
did not provide any additional information.
valgrind
is good at spotting the use of uninitialized values: use option --track-origins=yes to show where these originated from. What it cannot detect is the misuse of arrays allocated on the stack: this includes C automatic variables and some133 Fortran arrays.
It is possible to run all the examples, tests and vignettes covered by R CMD check
under valgrind
by using the option --use-valgrind. If you do this you will need to select the valgrind
options some other way, for example by having a ~/.valgrindrc file containing
--leak-check=full
--track-origins=yes
or setting the environment variable VALGRIND_OPTS
. As from R 4.2.0, --use-valgrind also uses valgrind
when re-building the vignettes.
This section has described the use of memtest
, the default (and most useful) of valgrind
’s tools. There are others described in its documentation: helgrind
can be useful for threaded programs.
4.3.3 Using the Address Sanitizer ¶
AddressSanitizer
(‘ASan’) is a tool with similar aims to the memory checker in valgrind
. It is available with suitable builds134 of gcc
and clang
on common Linux and macOS platforms. See https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation, https://clang.llvm.org/docs/AddressSanitizer.html and https://github.com/google/sanitizers.
More thorough checks of C++ code are done if the C++ library has been ‘annotated’: at the time of writing this applied to std::vector
in libc++
for use with clang
and gives rise to container-overflow’135 reports.
It requires code to have been compiled and linked with -fsanitize=address and compiling with -fno-omit-frame-pointer
will give more legible reports. It has a runtime penalty of 2–3x, extended compilation times and uses substantially more memory, often 1–2GB, at run time. On 64-bit platforms it reserves (but does not allocate) 16–20TB of virtual memory: restrictive shell settings can cause problems. It can be helpful to increase the stack size, for example to 40MB.
By comparison with valgrind
, ASan can detect misuse of stack and global variables but not the use of uninitialized memory.
Recent versions return symbolic addresses for the location of the error provided llvm-symbolizer
136 is on the path: if it is available but not on the path or has been renamed137, one can use an environment variable, e.g.
=/path/to/llvm-symbolizer ASAN_SYMBOLIZER_PATH
An alternative is to pipe the output through asan_symbolize.py
138 and perhaps then (for compiled C++ code) c++filt
. (On macOS, you may need to run dsymutil
to get line-number reports.)
The simplest way to make use of this is to build a version of R with something like
="gcc -std=gnu99 -fsanitize=address"
CC="-fno-omit-frame-pointer -g -O2 -Wall -pedantic -mtune=native" CFLAGS
which will ensure that the libasan
run-time library is compiled into the R executable. However this check can be enabled on a per-package basis by using a ~/.R/Makevars file like
= gcc -std=gnu99 -fsanitize=address -fno-omit-frame-pointer
CC = g++ -fsanitize=address -fno-omit-frame-pointer
CXX = gfortran -fsanitize=address FC
(Note that -fsanitize=address
has to be part of the compiler specification to ensure it is used for linking. These settings will not be honoured by packages which ignore ~/.R/Makevars.) It will be necessary to build R with
= -fsanitize=address MAIN_LDFLAGS
to link the runtime libraries into the R executable if it was not specified as part of CC’ when R was built. (For some builds without OpenMP, -pthread is also required.)
For options available via the environment variable ASAN_OPTIONS
see https://github.com/google/sanitizers/wiki/AddressSanitizerFlags. With gcc
additional control is available via the --param flag: see its man
page.
For more detailed information on an error, R can be run under a debugger with a breakpoint set before the address sanitizer report is produced: for gdb
or lldb
you could use
break __asan_report_error
(See https://github.com/google/sanitizers/wiki/AddressSanitizerAndDebugger.)
More recent versions139 added the flag -fsanitize-address-use-after-scope: see https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterScope.
One of the checks done by ASan is that malloc/free
and in C++ new/delete
and new[]/delete[]
are used consistently (rather than say free
being used to deallocate memory allocated by new[]
). This matters on some systems but not all: unfortunately on some of those where it does not matter, system libraries140 are not consistent. The check can be suppressed by including alloc_dealloc_mismatch=0’ in ASAN_OPTIONS
.
ASan also checks system calls and sometimes reports can refer to problems in the system software and not the package nor R. A couple of reports have been of ‘heap-use-after-free’ errors in the X11 libraries called from Tcl/Tk.
Apple provide a version of the address sanitizer in recent versions of its C/C++ compiler. This will probably give messages about ‘malloc: nano zone abandoned’ which are innocuous and can be suppressed by setting environment variable MallocNanoZone
to 0
. It can be helpful to install debug symbols (INSTALL --dsym
for the package under test and particularly for reverse dependencies.
4.3.4 Using the Leak Sanitizer ¶
For x86_64
Linux there is a leak sanitizer, ‘LSan’: see https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer. This is available on recent versions of gcc
and clang
, and where available is compiled in as part of ASan.
One way to invoke this from an ASan-enabled build is by the environment variable
='detect_leaks=1' ASAN_OPTIONS
However, this was made the default as from LLVM clang
3.5 and gcc
5.1.0.
When LSan is enabled, leaks give the process a failure error status (by default 23
). For an R package this means the R process, and as the parser retains some memory to the end of the process, if R itself was built against ASan all runs will have a failure error status (which may include running R as part of building R itself).
To disable this, allocation-mismatch checking and some strict C++ checking use
'alloc_dealloc_mismatch=0:detect_leaks=0:detect_odr_violation=0' setenv ASAN_OPTIONS
The leak sanitizer is not part of ASan in the Apple clang
implementation.
LSan also has a ‘stand-alone’ mode where it is compiled in using -fsanitize=leak and avoids the run-time overhead of ASan.
4.3.5 Using the Undefined Behaviour Sanitizer ¶
‘Undefined behaviour’ is where the language standard does not require particular behaviour from the compiler. Examples include division by zero (where for doubles R requires the ISO/IEC 60559 behaviour but C/C++ do not), use of zero-length arrays, shifts too far for signed types (e.g. int x, y; y = x << 31;
), out-of-range coercion, invalid C++ casts and mis-alignment. Not uncommon examples of out-of-range coercion in R packages are attempts to coerce a NaN
or infinity to type int
or NA_INTEGER
to an unsigned type such as size_t
. Also common is y[x - 1]
forgetting that x
might be NA_INTEGER
.
‘UBSanitizer’ is a tool for C/C++ source code selected by -fsanitize=undefined in suitable builds141 of clang
and GCC. Its (main) runtime library is linked into each package’s DLL, so it is less often needed to be included in MAIN_LDFLAGS
. Platforms supported by clang
are listed at https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#supported-platforms: CRAN uses it for C/C++ with both GCC and clang
on x86_64’ Linux: the two toolchains often highlight different things with more reports from clang
than GCC.
This sanitizer may be combined with the Address Sanitizer by -fsanitize=undefined,address (where both are supported, and we have seen library conflicts for clang
17 and later).
Finer control of what is checked can be achieved by other options.
For clang
see https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#ubsan-checks. The current set is (on a single line):
-fsanitize=alignment,bool,bounds,builtin,enum,float-cast-overflow,
float-divide-by-zero,function,implicit-unsigned-integer-truncation,
-signed-integer-truncation,implicit-integer-sign-change,
implicit-divide-by-zero,nonnull-attribute,null,nullability-arg,
integer-assign,nullability-return,object-size,
nullability-overflow,return,returns-nonnull-attribute,shift,
pointersigned-integer-overflow,unreachable,unsigned-integer-overflow,
unsigned-shift-base,vla-bound,vptr
(plus the more specific versions array-bounsds
, local-bounds
, shift-base
and shift-exponent
), or use something like
-fsanitize=undefined -fno-sanitize=float-divide-by-zero
where in recent versions -fno-sanitize=float-divide-by-zero
is the default.
Options return
and vptr
apply only to C++: to use vptr
its run-time library needs to be linked into the main R executable by building the latter with something like
="clang++ -fsanitize=undefined" MAIN_LD
Option float-divide-by-zero
is undesirable for use with R which allow such divisions as part of IEC 60559 arithmetic, and in versions of clang
since June 2019 it is no longer part of -fsanitize=undefined.
There are also groups of options implicit-integer-truncation
, mplicit-integer-arithmetic-value-change
, implicit-conversion
, integer
and nullability
.
For GCC see https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html (or the manual for your version of GCC, installed or via https://gcc.gnu.org/onlinedocs/: look for ‘Program Instrumentation Options’) for the options supported by GCC: versions 13.x supported
-fsanitize=alignment,bool,bounds,builtin,enum,integer-divide-by-zero,
-attribute,null,object-size,pointer-overflow,return,
nonnull-nonnull-attribute,shift,signed-integer-overflow,
returns,vla-bound,vptr unreachable
plus the more specific versions shift-base
and shift-exponent
and non-default options
-strict,float-cast-overflow,float-divide-by-zero bounds
where float-divide-by-zero
is not desirable for R uses and bounds-strict
is an extension of bounds
.
Other useful flags include
-no-fsanitize-recover
which causes the first report to be fatal (it always is for the unreachable
and return
suboptions). For more detailed information on where the runtime error occurs, using
'print_stacktrace=1' setenv UBSAN_OPTIONS
will include a traceback in the report. Beyond that, R can be run under a debugger with a breakpoint set before the sanitizer report is produced: for gdb
or lldb
you could use
break __ubsan_handle_float_cast_overflow
break __ubsan_handle_float_cast_overflow_abort
or similar (there are handlers for each type of undefined behaviour).
There are also the compiler flags -fcatch-undefined-behavior and -ftrapv, said to be more reliable in clang
than gcc
.
For more details on the topic see https://blog.regehr.org/archives/213 and https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html (which has 3 parts).
It may or may not be possible to build R itself with -fsanitize=undefined: problems have in the past been seen with OpenMP-using code with gcc
but there has been success with LLVM clang
up to version 16.. However, problems have been seen with LLVM clang
17 and later, including missing entry points and R builds hanging. What has succeeded is to use UBSAN just for the package under test (and not in combination with ASAN). To do so, check with an unaltered R, using a custom Makevars file something like
= clang -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer
CC = clang++ -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer -frtti
CXX
= /path/to/LLVM18/lib/clang/18/lib/x86_64-unknown-linux-gnu
UBSAN_DIR = $(UBSAN_DIR)/libclang_rt.ubsan_standalone.a $(UBSAN_DIR)/libclang_rt.ubsan_standalone_cxx.a SAN_LIBS
which links the UBSAN libraries statically into the package-under-test’s DSO. It is also possible to use the dynamic library via
= -L$(UBSAN_DIR) -Wl,-rpath,$(UBSAN_DIR) -lclang_rt.ubsan_standalone SAN_LIBS
provided UBSAN_DIR
is added to the runtime library path (as shown or using LD_LIBRARY_PATH
). N.B.: The details, especially the paths used, have changed several times recently.
Apple provides a version of the undefined behaviour sanitizer in recent versions of its C/C++ compiler. R was built with Apple clang
16 with config.site containing
CC="clang -fsanitize=address,undefined"
CXX="clang++ -fsanitize=address,undefined"
and passed its checks.
4.3.6 Other analyses with ‘clang’ ¶
Recent versions of LLVM clang
on Linux have ‘ThreadSanitizer’ (https://github.com/google/sanitizers/wiki#threadsanitizer), a ‘data race detector for C/C++ programs’, and ‘MemorySanitizer’ (https://clang.llvm.org/docs/MemorySanitizer.html, https://github.com/google/sanitizers) for the detection of uninitialized memory. Both are based on and provide similar functionality to tools in valgrind
. The ThreadSanitizer is also available for Apple clang
on macOS.
clang
has a ‘Static Analyzer’ which can be run on the source files during compilation: see https://clang-analyzer.llvm.org/.
4.3.7 Other analyses with ‘gcc’ ¶
GCC 10 introduced a new flag -fanalyzer which does static analysis during compilation, currently for C code. It is regarded as experimental and it may slow down computation considerably when problems are found (and use many GB of resident memory). There is some overlap with problems detected by the Undefined Behaviour sanitizer, but some issues are only reported by this tool and as it is a static analysis, it does not rely on code paths being exercised.
See https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Static-Analyzer-Options.html (or the documentation for your version of gcc
if later) and https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10
4.3.8 Using ‘Dr. Memory’ ¶
‘Dr. Memory’ from https://drmemory.org/ is a memory checker for (currently) Windows, Linux and macOS with similar aims to valgrind
. It works with unmodified executables142 and detects memory access errors, uninitialized reads and memory leaks.
4.3.9 Fortran array bounds checking ¶
Most of the Fortran compilers used with R allow code to be compiled with checking of array bounds: for example gfortran
has option -fbounds-check. This will give an error when the upper or lower bound is exceeded, e.g.
97 of file .../src/appl/dqrdc2.f
At line : Index '1' of dimension 1 of array 'x' above upper bound of 0 Fortran runtime error
One does need to be aware that lazy programmers often specify Fortran dimensions as 1
rather than *
or a real bound and these will be reported (as may *
dimensions)
It is easy to arrange to use this check on just the code in your package: add to ~/.R/Makevars something like (for gfortran
)
= -g -O2 -mtune=native -fbounds-check FFLAGS
when you run R CMD check
.
This may report errors with the way that Fortran character variables are passed, particularly when Fortran subroutines are called from C code and character lengths are not passed (see Fortran character strings).
4.4 Debugging compiled code ¶
Sooner or later programmers will be faced with the need to debug compiled code loaded into R. This section is geared to platforms using gdb
with code compiled by gcc
, but similar things are possible with other debuggers such as lldb
(https://lldb.llvm.org/, used on macOS) and Sun’s dbx
: some debuggers have graphical front-ends available.
Consider first ‘crashes’, that is when R terminated unexpectedly with an illegal memory access (a ‘segfault’ or ‘bus error’), illegal instruction or similar. Unix-alike versions of R use a signal handler which aims to give some basic information. For example
*** caught segfault ***
0x20000028, cause 'memory not mapped'
address
:
Traceback1: .identC(class1[[1]], class2)
2: possibleExtends(class(sloti), classi, ClassDef2 = getClassDef(classi,
where = where))
3: validObject(t(cu))
4: stopifnot(validObject(cu <- as(tu, "dtCMatrix")), validObject(t(cu)),
validObject(t(tu)))
:
Possible actions1: abort (with core dump)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
: 3 Selection
Since the R process may be damaged, the only really safe options are the first or third. (Note that a core dump is only produced where enabled: a common default in a shell is to limit its size to 0, thereby disabling it.)
A fairly common cause of such crashes is a package which uses .C
or .Fortran
and writes beyond (at either end) one of the arguments it is passed. There is a good way to detect this: using options(CBoundsCheck = TRUE)
(which can be selected via the environment variable R_C_BOUNDS_CHECK=yes)
changes the way .C
and .Fortran
work to check if the compiled code writes in the 64 bytes at either end of an argument.
Another cause of a ‘crash’ is to overrun the C stack. R tries to track that in its own code, but it may happen in third-party compiled code. For modern POSIX-compliant OSes R can safely catch that and return to the top-level prompt, so one gets something like
> .C("aaa")
: segfault from C stack overflow
Error>
However, C stack overflows are fatal under Windows and normally defeat attempts at debugging on that platform. Further, the size of the stack is set when R is compiled on Windows, whereas on POSIX OSes it can be set in the shell from which R is launched.
If you have a crash which gives a core dump you can use something like
/path/to/R/bin/exec/R core.12345 gdb
to examine the core dump. If core dumps are disabled or to catch errors that do not generate a dump one can run R directly under a debugger by for example
$ R -d gdb --vanilla
...> run gdb
at which point R will run normally, and hopefully the debugger will catch the error and return to its prompt. This can also be used to catch infinite loops or interrupt very long-running code. For a simple example
> for(i in 1:1e7) x <- rnorm(100)
[hit Ctrl-C]
, Interrupt.
Program received signal SIGINT0x00397682 in _int_free () from /lib/tls/libc.so.6
(gdb) where
#0 0x00397682 in _int_free () from /lib/tls/libc.so.6
#1 0x00397eba in free () from /lib/tls/libc.so.6
#2 0xb7cf2551 in R_gc_internal (size_needed=313)
/users/ripley/R/svn/R-devel/src/main/memory.c:743
at #3 0xb7cf3617 in Rf_allocVector (type=13, length=626)
/users/ripley/R/svn/R-devel/src/main/memory.c:1906
at #4 0xb7c3f6d3 in PutRNGstate ()
/users/ripley/R/svn/R-devel/src/main/RNG.c:351
at #5 0xb7d6c0a5 in do_random2 (call=0x94bf7d4, op=0x92580e8, args=0x9698f98,
=0x9698f28) at /users/ripley/R/svn/R-devel/src/main/random.c:183
rho...
In many cases it is possible to attach a debugger to a running process: this is helpful if an alternative front-end is in use or to investigate a task that seems to be taking far too long. This is done by something like
-p pid gdb
where pid is the id of the R executable or front-end process and can be found from within a running R process by calling Sys.getpid()
or from a process monitor. This stops the process so its state can be examined: use continue
to resume execution.
Some “tricks” worth knowing follow:
4.4.1 Finding entry points in dynamically loaded code ¶
Under most compilation environments, compiled code dynamically loaded into R cannot have breakpoints set within it until it is loaded. To use a symbolic debugger on such dynamically loaded code under Unix-alikes use
- Call the debugger on the R executable, for example by R -d gdb.
- Start R.
- At the R prompt, use
dyn.load
orlibrary
to load your shared object. - Send an interrupt signal. This will put you back to the debugger prompt.
- Set the breakpoints in your code.
- Continue execution of R by typing signal 0 and hitting return (RET).
Under Windows signals may not be able to be used, and if so the procedure is more complicated. See the rw-FAQ.
4.4.2 Inspecting R objects when debugging ¶
The key to inspecting R objects from compiled code is the function PrintValue(SEXP
s)
which uses the normal R printing mechanisms to print the R object pointed to by s, or the safer version R_PV(SEXP
s)
which will only print ‘objects’.
One way to make use of PrintValue
is to insert suitable calls into the code to be debugged.
Another way is to call R_PV
from the symbolic debugger. (PrintValue
is hidden as Rf_PrintValue
.) For example, from gdb
we can use
R_PV(ab) (gdb) p
using the object ab
from the convolution example, if we have placed a suitable breakpoint in the convolution C code.
To examine an arbitrary R object we need to work a little harder. For example, let
> DF <- data.frame(a = 1:3, b = 4:6) R
By setting a breakpoint at do_get
and typing get("DF") at the R prompt, one can find out the address in memory of DF
, for example
1 = (SEXPREC *) 0x40583e1c
Value returned is $(gdb) p *$1
2 = {
$= {type = 19, obj = 1, named = 1, gp = 0,
sxpinfo = 0, debug = 0, trace = 0, = 0},
mark = 0x40583e80,
attrib = {
u = {
vecsxp = 2,
length = {c = 0x40634700 "0>X@D>X@0>X@", i = 0x40634700,
type = 0x40634700, z = 0x40634700, s = 0x40634700},
f = 1075851272,
truelength },
= {offset = 2},
primsxp = {pname = 0x2, value = 0x40634700, internal = 0x40203008},
symsxp = {carval = 0x2, cdrval = 0x40634700, tagval = 0x40203008},
listsxp = {frame = 0x2, enclos = 0x40634700},
envsxp = {formals = 0x2, body = 0x40634700, env = 0x40203008},
closxp = {value = 0x2, expr = 0x40634700, env = 0x40203008}
promsxp }
}
(Debugger output reformatted for better legibility).
Using R_PV()
one can “inspect” the values of the various elements of the SEXP
, for example,
R_PV($1->attrib)
(gdb) p $names
1] "a" "b"
[
$row.names
1] "1" "2" "3"
[
$class
1] "data.frame"
[
$3 = void
To find out where exactly the corresponding information is stored, one needs to go “deeper”:
$a = $1->attrib
(gdb) set $a->u.listsxp.tagval->u.symsxp.pname->u.vecsxp.type.c
(gdb) p $4 = 0x405d40e8 "names"
$a->u.listsxp.carval->u.vecsxp.type.s[1]->u.vecsxp.type.c
(gdb) p $5 = 0x40634378 "b"
$1->u.vecsxp.type.s[0]->u.vecsxp.type.i[0]
(gdb) p $6 = 1
$1->u.vecsxp.type.s[1]->u.vecsxp.type.i[1]
(gdb) p $7 = 5
Another alternative is the R_inspect
function which shows the low-level structure of the objects recursively (addresses differ from the above as this example is created on another machine):
R_inspect($1)
(gdb) p @100954d18 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
@100954d50 13 INTSXP g0c2 [NAM(2)] (len=3, tl=0) 1,2,3
@100954d88 13 INTSXP g0c2 [NAM(2)] (len=3, tl=0) 4,5,6
:
ATTRIB@102a70140 02 LISTSXP g0c0 []
: @10083c478 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names"
TAG@100954dc0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
@10099df28 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
@10095e518 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
: @100859e60 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "row.names"
TAG@102a6f868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=1) -2147483648,-3
: @10083c948 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
TAG@102a6f838 16 STRSXP g0c1 [NAM(2)] (len=1, tl=1)
@1008c6d48 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
In general the representation of each object follows the format:
@<address> <type-nr> <type-name> <gc-info> [<flags>] ...
For a more fine-grained control over the depth of the recursion and the output of vectors R_inspect3
takes additional two character() parameters: maximum depth and the maximal number of elements that will be printed for scalar vectors. The defaults in R_inspect
are currently -1 (no limit) and 5 respectively.
4.4.3 Debugging on macOS ¶
To debug code in a package it is easiest to unpack it in a directory and install it with
R CMD INSTALL --dsym pkgname
as macOS does not store debugging symbols in the .so file. (It is not necessary to have R built with debugging symbols, although compiling the package should be done including -g in CFLAGS
/ CXXFLAGS
/ FFLAGS
/ FCFLAGS
as appropriate.)
Security measures may prevent running a CRAN binary distribution of R under lldb
or attaching this as a debugger (https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#I-cannot-attach-debugger-to-R), although both were possible on High Sierra and are again from R 4.2.0. This can also affect locally compiled builds, where attaching to an interactive R session under Big Sur or Monterey worked in 2022 after giving administrator permission via a popup-up. (To debug in what Apple deems a non-interactive session, e.g. logged in remotely, see man DevToolsSecurity
.)
Debugging a local build of R on macOS can raise additional hurdles as environment variables such as DYLD_FALLBACK_LIBRARY_PATH
are not usually passed through143 the lldb
process, resulting in messages like
-d lldb
R
...
(lldb) run16828 launched: '/path/to/bin/exec/R' (x86_64)
Process : Library not loaded: libR.dylib
dyld: /path/to/bin/exec/R Referenced from
A quick workaround is to symlink the dylibs under R_HOME/lib to somewhere where they will be found such as the current working directory. It would be possible to do as the distribution does144 and use install_name_tool
, but that would have to be done for all the dylibs including those in packages.
It may be simplest to attach the debugger to a running process (see above). Specifically, run R and when it is at the prompt just before a command that is to be debugged, at a terminal
-ef | grep exec/R
ps # identify the PID pid for the next command: it is the second item
-p pid
lldb (lldb) continue
and then return to the R console.
For non-interactive use, one may need lldb --batch
.
4.5 Using Link-time Optimization ¶
Where supported, link time optimization provides a comprehensive way to check the consistency of calls between Fortran files or between C and Fortran. Use this via R CMD INSTALL --use-LTO
(but that does not apply if there is a src/Makefile file or a Windows analogue).
To set up support on a Unix-alike, see Link-Time Optimization in R Installation and Administration. On Linux using GCC without building R with LTO support, it should suffice to set
= -flto
LTO_OPT = -flto
LTO_FC_OPT = gcc-ar
AR = gcc-nm NM
in a personal (or site) Makevars file: See Customizing package compilation in R Installation and Administration for more information.
For Windows, first edit file etc/${R_ARCH}/Makeconf to give LTO_OPT
the value -flto
or do so in a personal/site Makevars file; see also file src/gnuwin32/README.compilation in the sources.
For example:
:61: warning: type of 'ddot' does not match original declaration [-Wlto-type-mismatch]
boot.fy(j,i)=ddot(p,x(j,1),n,b(1,j,i),1)
:1023: note: return value type mismatch crq.f
where the package author forgot to declare
double precision ddot
external ddot
in boot.f. That package had its own copy of ddot
: to detect misuse of the one in R’s BLAS library would have needed R configured with --enable-lto=check.
Further examples:
:77:5: warning: type of 'dstup' does not match original declaration [-Wlto-type-mismatch]
rkpk2.f*info, wk)
:2565:5: note: type mismatch in parameter 14
rkpk1.fdstup (s, lds, nobs, nnull, qraux, jpvt, y, q, ldqr,
subroutine :2565:5: note: 'dstup' was previously declared here rkpk1.f
where the fourteenth argument dum
was missing in the call.
:78:33: warning: type of 'dqrdc' does not match original declaration [-Wlto-type-mismatch]
reg.fdqrdc (sr, nobs, nobs, nnull, wk, dum, dum, 0)
call :20: note: 'dqrdc' was previously declared here
dstup.fdqrdc (s, lds, nobs, nnull, qraux, jpvt, work, 1) call
dqrdc
is a LINPACK routine from R, jpvt
is an integer array and work
is a double precision one so dum
cannot match both. (If --enable-lto=check had been used the comparison would have been with the definition in R.)
For Fortran files all in the package, most inconsistencies can be detected by concatenating the Fortran files and compiling the result, sometimes with clearer diagnostics than provided by LTO. For our last two examples this gives
all.f:2966:72:
*info, work1)
1 Warning: Missing actual argument for argument 'dum' at (1)
and
all.f:1663:72:
*ipvtwk), wk(ikwk), wk(iwork1), wk(iwork2), info)
1 Warning: Type mismatch in argument 'jpvt' at (1); passed REAL(8) to INTEGER(4)
On a Unix-alike for a package with a src/Makefile file, LTO can be enabled by including suitable flags in that file, for example
= $(LTO_OPT)
LTO = $(LTO_FC_OPT) LTO_FC
and ensuring these are used for compilation, for example as part of CFLAGS
, CXXFLAGS
or FCFLAGS
. If R CMD SHLIB
is used for compilation, add --use-LTO to its call.
On Windows for a package with a src/Makefile.ucrt or src/Makefile.win file which includes "${R_HOME}/etc${R_ARCH}/Makeconf"’, include
= $(LTO_OPT) LTO
or to always use LTO however R was built,
= -flto LTO