5 System and foreign language interfaces ¶
Next: The R API: entry points for C code, Previous: Debugging, Up: Writing R Extensions [Contents][Index]
Many of the functions described here have entry-point names with a Rf_
prefix: if they are called from C code (but not C++ code as from R 4.5.0) that prefix can be omitted. Users are encouraged to use the prefix when writing new C code.
5.1 Operating system access ¶
Access to operating system functions is via the R functions system
and system2
. The details will differ by platform (see the on-line help), and about all that can safely be assumed is that the first argument will be a string command
that will be passed for execution (not necessarily by a shell) and the second argument to system
will be internal
which if true will collect the output of the command into an R character vector.
On POSIX-compliant OSes these commands pass a command-line to a shell: Windows is not POSIX-compliant and there is a separate function shell
to do so.
The function system.time
is available for timing. Timing on child processes is only available on Unix-alikes, and may not be reliable there.
5.2 Interface functions .C
and .Fortran
¶
These two functions provide an interface to compiled code that has been linked into R, either at build time or via dyn.load
(see dyn.load
and dyn.unload
). They are primarily intended for compiled C and Fortran code respectively, but the .C
function can be used with other languages which can generate C interfaces, for example C++ (see Interfacing C++ code).
The first argument to each function is a character string specifying the symbol name as known145 to C or Fortran, that is the function or subroutine name. (That the symbol is loaded can be tested by, for example, is.loaded("cg")
. Use the name you pass to .C
or .Fortran
rather than the translated symbol name.)
There can be up to 65 further arguments giving R objects to be passed to compiled code. Normally these are copied before being passed in, and copied again to an R list object when the compiled code returns. If the arguments are given names, these are used as names for the components in the returned list object (but not passed to the compiled code).
The following table gives the mapping between the modes of R atomic vectors and the types of arguments to a C function or Fortran subroutine.
R storage mode C type Fortran type logical
int *
INTEGER
integer
int *
INTEGER
double
double *
DOUBLE PRECISION
complex
Rcomplex *
DOUBLE COMPLEX
character
char **
CHARACTER(255)
raw
unsigned char *
none
On all R platforms int
and INTEGER
are 32-bit. Code ported from S-PLUS (which uses long *
for logical
and integer
) will not work on all 64-bit platforms (although it may appear to work on some, including x86_64’ Windows). Note also that if your compiled code is a mixture of C functions and Fortran subprograms the argument types must match as given in the table above.
C type Rcomplex
is a structure with double
members r
and i
defined in the header file R_ext/Complex.h.146 (On most platforms this is stored in a way compatible with the C99 double complex
type: however, it may not be possible to pass Rcomplex
to a C99 function expecting a double complex
argument. Nor need it be compatible with a C++ complex
type. Moreover, the compatibility can depend on the optimization level set for the compiler.)
Only a single character string of fixed length can be passed to or from Fortran (the length is not passed), and the success of this is compiler-dependent: its use was formally deprecated in 2019. Other R objects can be passed to .C
, but it is much better to use one of the other interfaces.
It is possible to pass numeric vectors of storage mode double
to C as float *
or to Fortran as REAL
by setting the attribute Csingle
, most conveniently by using the R functions as.single
, single
or mode
. This is intended only to be used to aid interfacing existing C or Fortran code.
Logical values are sent as 0
(FALSE
), 1
(TRUE
) or INT_MIN = -2147483648
(NA
, but only if NAOK
is true), and the compiled code should return one of these three values. (Non-zero values other than INT_MIN
are mapped to TRUE
.) Note that the use of int *
for Fortran logical is not guaranteed to be portable (although people have gotten away with it for many years): it is better to pass integers and convert to/from Fortran logical in a Fortran wrapper.
Unless formal argument NAOK
is true, all the other arguments are checked for missing values NA
and for the IEEE special values NaN
, Inf
and -Inf
, and the presence of any of these generates an error. If it is true, these values are passed unchecked.
Argument PACKAGE
confines the search for the symbol name to a specific shared object (or use "base"
for code compiled into R). Its use is highly desirable, as there is no way to avoid two package writers using the same symbol name, and such name clashes are normally sufficient to cause R to crash. (If it is not present and the call is from the body of a function defined in a package namespace, the shared object loaded by the first (if any) useDynLib
directive will be used.)
Note that the compiled code should not return anything except through its arguments: C functions should be of type void
and Fortran subprograms should be subroutines.
To fix ideas, let us consider a very simple example which convolves two finite sequences. (This is hard to do fast in interpreted R code, but easy in C code.) We could do this using .C
by
void convolve(double *a, int *na, double *b, int *nb, double *ab)
{
int nab = *na + *nb - 1;
for(int i = 0; i < nab; i++)
[i] = 0.0;
abfor(int i = 0; i < *na; i++)
for(int j = 0; j < *nb; j++)
[i + j] += a[i] * b[j];
ab}
called from R by
<- function(a, b)
conv .C("convolve",
.double(a),
as.integer(length(a)),
as.double(b),
as.integer(length(b)),
as= double(length(a) + length(b) - 1))$ab ab
Note that we take care to coerce all the arguments to the correct R storage mode before calling .C
; mistakes in matching the types can lead to wrong results or hard-to-catch errors.
Special care is needed in handling character
vector arguments in C (or C++). On entry the contents of the elements are duplicated and assigned to the elements of a char **
array, and on exit the elements of the C array are copied to create new elements of a character vector. This means that the contents of the character strings of the char **
array can be changed, including to \0
to shorten the string, but the strings cannot be lengthened. It is possible147 to allocate a new string via R_alloc
and replace an entry in the char **
array by the new string. However, when character vectors are used other than in a read-only way, the .Call
interface is much to be preferred.
Passing character strings to Fortran code needs even more care, is deprecated and should be avoided where possible. Only the first element of the character vector is passed in, as a fixed-length (255) character array. Up to 255 characters are passed back to a length-one character vector. How well this works (or even if it works at all) depends on the C and Fortran compilers on each platform (including on their options). Often what is being passed to Fortran is one of a small set of possible values (a factor in R terms) which could alternatively be passed as an integer code: similarly Fortran code that wants to generate diagnostic messages could pass an integer code to a C or R wrapper which would convert it to a character string.
It is possible to pass some R objects other than atomic vectors via .C
, but this is only supported for historical compatibility: use the .Call
or .External
interfaces for such objects. Any C/C++ code that includes Rinternals.h should be called via .Call
or .External
.
.Fortran
is primarily intended for Fortran 77 code, and long precedes any support for ‘modern’ Fortran. Nowadays implementations of Fortran support the Fortran 2003 module iso_c_binding
, a better way to interface modern Fortran code to R is to use .C
and write a C interface using use iso_c_binding
.
5.3 dyn.load
and dyn.unload
¶
Compiled code to be used with R is loaded as a shared object (Unix-alikes including macOS, see Creating shared objects for more information) or DLL (Windows).
The shared object/DLL is loaded by dyn.load
and unloaded by dyn.unload
. Unloading is not normally necessary and is not safe in general, but it is needed to allow the DLL to be re-built on some platforms, including Windows. Unloading a DLL and then re-loading a DLL of the same name may not work: Solaris used the first version loaded. A DLL that registers C finalizers, but fails to unregister them when unloaded, may cause R to crash after unloading.
The first argument to both functions is a character string giving the path to the object. Programmers should not assume a specific file extension for the object/DLL (such as .so) but use a construction like
file.path(path1, path2, paste0("mylib", .Platform$dynlib.ext))
for platform independence. On Unix-alike systems the path supplied to dyn.load
can be an absolute path, one relative to the current directory or, if it starts with ~’, relative to the user’s home directory.
Loading is most often done automatically based on the useDynLib()
declaration in the NAMESPACE file, but may be done explicitly via a call to library.dynam
. This has the form
library.dynam("libname", package, lib.loc)
where libname
is the object/DLL name with the extension omitted. Note that the first argument, chname
, should not be package
since this will not work if the package is installed under another name.
Under some Unix-alike systems there is a choice of how the symbols are resolved when the object is loaded, governed by the arguments local
and now
. Only use these if really necessary: in particular using now=FALSE
and then calling an unresolved symbol will terminate R unceremoniously.
R provides a way of executing some code automatically when a object/DLL is either loaded or unloaded. This can be used, for example, to register native routines with R’s dynamic symbol mechanism, initialize some data in the native code, or initialize a third party library. On loading a DLL, R will look for a routine within that DLL named R_init_
lib where lib is the name of the DLL file with the extension removed. For example, in the command
library.dynam("mylib", package, lib.loc)
R looks for the symbol named R_init_mylib
. Similarly, when unloading the object, R looks for a routine named R_unload_
lib, e.g., R_unload_mylib
. In either case, if the routine is present, R will invoke it and pass it a single argument describing the DLL. This is a value of type DllInfo
which is defined in the Rdynload.h file in the R_ext directory.
Note that there are some implicit restrictions on this mechanism as the basename of the DLL needs to be both a valid file name and valid as part of a C entry point (e.g. it cannot contain .‘): for portable code it is best to confine DLL names to be ASCII alphanumeric plus underscore. If entry point R_init_
lib is not found it is also looked for with .’ replaced by _’.
The following example shows templates for the initialization and unload routines for the mylib
DLL.
#include <R_ext/Rdynload.h> void (DllInfo *info) R_init_mylib{ /* Register routines, allocate resources. */ } void (DllInfo *info) R_unload_mylib{ /* Release resources. */ }
If a shared object/DLL is loaded more than once the most recent version is used.148 More generally, if the same symbol name appears in several shared objects, the most recently loaded occurrence is used. The PACKAGE
argument and registration (see the next section) provide good ways to avoid any ambiguity in which occurrence is meant.
On Unix-alikes the paths used to resolve dynamically-linked dependent libraries are fixed (for security reasons) when the process is launched, so dyn.load
will only look for such libraries in the locations set by the R shell script (via etc/ldpaths) and in the OS-specific defaults.
Windows allows more control (and less security) over where dependent DLLs are looked for. On all versions this includes the PATH
environment variable, but with lowest priority: note that it does not include the directory from which the DLL was loaded. It is possible to add a single path with quite high priority via the DLLpath
argument to dyn.load
. This is (by default) used by library.dynam
to include the package’s libs/x64 directory (on Intel) in the DLL search path.
5.4 Registering native routines ¶
By ‘native’ routine, we mean an entry point in compiled code.
In calls to .C
, .Call
, .Fortran
and .External
, R must locate the specified native routine by looking in the appropriate shared object/DLL. By default, R uses the operating-system-specific dynamic loader to lookup the symbol in all149 loaded DLLs and the R executable or libraries it is linked to. Alternatively, the author of the DLL can explicitly register routines with R and use a single, platform-independent mechanism for finding the routines in the DLL. One can use this registration mechanism to provide additional information about a routine, including the number and type of the arguments, and also make it available to R programmers under a different name.
Registering routines has two main advantages: it provides a faster150 way to find the address of the entry point via tables stored in the DLL at compilation time, and it provides a run-time check that the entry point is called with the right number of arguments and, optionally, the right argument types.
To register routines with R, one calls the C routine R_registerRoutines
. This is typically done when the DLL is first loaded within the initialization routine R_init_
dll name described in dyn.load
and dyn.unload
. R_registerRoutines
takes 5 arguments. The first is the DllInfo
object passed by R to the initialization routine. This is where R stores the information about the methods. The remaining 4 arguments are arrays describing the routines for each of the 4 different interfaces: .C
, .Call
, .Fortran
and .External
. Each argument is a NULL
-terminated array of the element types given in the following table:
.C
R_CMethodDef
.Call
R_CallMethodDef
.Fortran
R_FortranMethodDef
.External
R_ExternalMethodDef
Currently, the R_ExternalMethodDef
type is the same as R_CallMethodDef
type and contains fields for the name of the routine by which it can be accessed in R, a pointer to the actual native symbol (i.e., the routine itself), and the number of arguments the routine expects to be passed from R. For example, if we had a routine named myCall
defined as
(SEXP a, SEXP b, SEXP c); SEXP myCall
we would describe this as
static const R_CallMethodDef callMethods[] = {
{"myCall", (DL_FUNC) &myCall, 3},
{NULL, NULL, 0}
};
along with any other routines for the .Call
interface. For routines with a variable number of arguments invoked via the .External
interface, one specifies -1
for the number of arguments which tells R not to check the actual number passed.
Routines for use with the .C
and .Fortran
interfaces are described with similar data structures, which have one optional additional field for describing the type of each argument. If specified, this field should be an array with the SEXP
types describing the expected type of each argument of the routine. (Technically, the elements of the types array are of type R_NativePrimitiveArgType
which is just an unsigned integer.) The R types and corresponding type identifiers are provided in the following table:
numeric
REALSXP
integer
INTSXP
logical
LGLSXP
single
SINGLESXP
character
STRSXP
list
VECSXP
Consider a C routine, myC
, declared as
void myC(double *x, int *n, char **names, int *status);
We would register it as
static R_NativePrimitiveArgType myC_type[] = {
, INTSXP, STRSXP, LGLSXP
REALSXP};
static const R_CMethodDef cMethods[] = {
{"myC", (DL_FUNC) &myC, 4, myC_type},
{NULL, NULL, 0, NULL}
};
If registering types, check carefully that the number of types matches the number of arguments: as the type array (here myC_type
) is passed as a pointer in C, the registration mechanism cannot check this for you.
Note that .Fortran
entry points are mapped to lowercase, so registration should use lowercase only.
Having created the arrays describing each routine, the last step is to actually register them with R. We do this by calling R_registerRoutines
. For example, if we have the descriptions above for the routines accessed by the .C
and .Call
we would use the following code:
voidR_init_myLib(DllInfo *info)
{R_registerRoutines(info, cMethods, callMethods, NULL, NULL);
}
This routine will be invoked when R loads the shared object/DLL named myLib
. The last two arguments in the call to R_registerRoutines
are for the routines accessed by .Fortran
and .External
interfaces. In our example, these are given as NULL
since we have no routines of these types.
When R unloads a shared object/DLL, its registrations are removed. There is no other facility for unregistering a symbol.
Examples of registering routines can be found in the different packages in the R source tree (e.g., stats and graphics). Also, there is a brief, high-level introduction in R News (volume 1/3, September 2001, pages 20–23, https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf).
Once routines are registered, they can be referred to as R objects if this is arranged in the useDynLib
call in the package’s NAMESPACE file (see useDynLib
). So for example the stats package has
# Refer to all C/Fortran routines by their name prefixed by C_
useDynLib(stats, .registration = TRUE, .fixes = "C_")
in its NAMESPACE file, and then ansari.test
’s default methods can contain
<- function(q, m, n)
pansari .C(C_pansari, as.integer(length(q)), p = as.double(q),
.integer(m), as.integer(n))$p as
This avoids the overhead of looking up an entry point each time it is used, and ensures that the entry point in the package is the one used (without a PACKAGE = "pkg"
argument).
R_init_
routines are often of the form
void attribute_visible R_init_mypkg(DllInfo *dll)
{
(dll, CEntries, CallEntries, FortEntries,
R_registerRoutines);
ExternalEntries(dll, FALSE);
R_useDynamicSymbols(dll, TRUE);
R_forceSymbols...
}
The R_useDynamicSymbols
call says the DLL is not to be searched for entry points specified by character strings so .C
etc calls will only find registered symbols: the R_forceSymbols
call only allows .C
etc calls which specify entry points by R objects such as C_pansari
(and not by character strings). Each provides some protection against accidentally finding your entry points when people supply a character string without a package, and avoids slowing down such searches. (For the visibility attribute see Controlling visibility.)
In more detail, if a package mypkg
contains entry points reg
and unreg
and the first is registered as a 0-argument .Call
routine, we could use (from code in the package)
.Call("reg")
.Call("unreg")
Without or with registration, these will both work. If R_init_mypkg
calls R_useDynamicSymbols(dll, FALSE)
, only the first will work. If in addition to registration the NAMESPACE file contains
useDynLib(mypkg, .registration = TRUE, .fixes = "C_")
then we can call .Call(C_reg)
. Finally, if R_init_mypkg
also calls R_forceSymbols(dll, TRUE)
, only .Call(C_reg)
will work (and not .Call("reg")
). This is usually what we want: it ensures that all of our own .Call
calls go directly to the intended code in our package and that no one else accidentally finds our entry points. (Should someone need to call our code from outside the package, for example for debugging, they can use .Call(mypkg:::C_reg)
.)
5.4.1 Speed considerations ¶
Sometimes registering native routines or using a PACKAGE
argument can make a large difference. The results can depend quite markedly on the OS (and even if it is 32- or 64-bit), on the version of R and what else is loaded into R at the time.
To fix ideas, first consider x86_64
OS 10.7 and R 2.15.2. A simple .Call
function might be
<- function(x) .Call("foo", x) foo
with C code
#include <Rinternals.h>
(SEXP x)
SEXP foo{
return x;
}
If we compile with by R CMD SHLIB foo.c
, load the code by dyn.load("foo.so")
and run foo(pi)
it took around 22 microseconds (us). Specifying the DLL by
<- function(x) .Call("foo", x, PACKAGE = "foo") foo2
reduced the time to 1.7 us.
Now consider making these functions part of a package whose NAMESPACE file uses useDynlib(foo)
. This immediately reduces the running time as "foo"
will be preferentially looked for foo.dll. Without specifying PACKAGE
it took about 5 us (it needs to fathom out the appropriate DLL each time it is invoked but it does not need to search all DLLs), and with the PACKAGE
argument it is again about 1.7 us.
Next suppose the package has registered the native routine foo
. Then foo()
still has to find the appropriate DLL but can get to the entry point in the DLL faster, in about 4.2 us. And foo2()
now takes about 1 us. If we register the symbols in the NAMESPACE file and use
<- function(x) .Call(C_foo, x) foo3
then the address for the native routine is looked up just once when the package is loaded, and foo3(pi)
takes about 0.8 us.
Versions using .C()
rather than .Call()
took about 0.2 us longer.
These are all quite small differences, but C routines are not uncommonly invoked millions of times for run times of a few microseconds each, and those doing such things may wish to be aware of the differences.
On Linux and Solaris there is a smaller overhead in looking up symbols.
Symbol lookup on Windows used to be far slower, so R maintains a small cache. If the cache is currently empty enough that the symbol can be stored in the cache then the performance is similar to Linux and Solaris: if not it may be slower. R’s own code always uses registered symbols and so these never contribute to the cache: however many other packages do rely on symbol lookup.
In more recent versions of R all the standard packages register native symbols and do not allow symbol search, so in a new session foo()
can only look in foo.so and may be as fast as foo2()
. This will no longer apply when many contributed packages are loaded, and generally those last loaded are searched first. For example, consider R 3.3.2 on x86_64 Linux. In an empty R session, both foo()
and foo2()
took about 0.75 us; however after packages igraph and spatstat had been loaded (which loaded another 12 DLLs), foo()
took 3.6 us but foo2()
still took about 0.80 us. Using registration in a package reduced this to 0.55 us and foo3()
took 0.40 us, times which were unchanged when further packages were loaded.
5.4.2 Example: converting a package to use registration ¶
The splines package was converted to use symbol registration in 2001, but we can use it as an example151 of what needs to be done for a small package.
Find the relevant entry points. This is somewhat OS-specific, but something like the following should be possible at the OS command-line
-g /path/to/splines.so | grep " T " nm 0000000000002670 T _spline_basis 0000000000001ec0 T _spline_value
This indicates that there are two relevant entry points. (They may or may not have a leading underscore, as here. Fortran entry points will have a trailing underscore on all current platforms.) Check in the R code that they are called by the package and how: in this case they are used by
.Call
.Alternatively, examine the package’s R code for all
.C
,.Fortran
,.Call
and.External
calls.Construct the registration table. First write skeleton registration code, conventionally in file src/init.c (or at the end of the only C source file in the package: if included in a C++ file the R_init’ function would need to be declared
extern "C"
):#include <stdlib.h> // for NULL #include <R_ext/Rdynload.h> #define CALLDEF(name, n) {#name, (DL_FUNC) &name, n} static const R_CallMethodDef R_CallDef[] = { (spline_basis, ?), CALLDEF(spline_value, ?), CALLDEF{NULL, NULL, 0} }; void R_init_splines(DllInfo *dll) { (dll, NULL, R_CallDef, NULL, NULL); R_registerRoutines}
and then replace the
?
in the skeleton with the actual numbers of arguments. You will need to add declarations (also known as ‘prototypes’) of the functions unless appending to the only C source file. Some packages will already have these in a header file, or you could create one and include it in init.c, for example splines.h containing#include <Rinternals.h> // for SEXP extern SEXP spline_basis(SEXP knots, SEXP order, SEXP xvals, SEXP derivs); extern SEXP spline_value(SEXP knots, SEXP coeff, SEXP order, SEXP x, SEXP deriv);
Tools are available to extract declarations, at least for C and C++ code: see the help file for
package_native_routine_registration_skeleton
in package tools. Here we could have used-I/path/to/R/include -e splines.c cproto
For examples of registering other types of calls, see packages graphics and stats. In particular, when registering entry points for
.Fortran
one needs declarations as if called from C, such as#include <R_ext/RS.h> void F77_NAME(supsmu)(int *n, double *x, double *y, double *w, int *iper, double *span, double *alpha, double *smo, double *sc, double *edf);
gfortran
8.4, 9.2 and later can help generate such prototypes with its flag -fc-prototypes-external (although one will need to replace the hard-coded trailing underscore with theF77_NAME
macro).One can get away with inaccurate argument lists in the declarations: it is easy to specify the arguments for
.Call
(allSEXP
) and.External
(oneSEXP
) and as the arguments for.C
and.Fortran
are all pointers, specifying them asvoid *
suffices. (For most platforms one can omit all the arguments, although link-time optimization will warn, as will compilers set up to warn on strict prototypes – and C23 requires correct arguments.)Using -fc-prototypes-external will give a prototype using
int_least32_t *lgl
for FortranLOGICAL LGL
, but this is not portable and traditionally it has been assumed that the C/C++ equivalent wasint *lgl
. If adding a declaration just to register a.Fortran
call, the most portable version isvoid *lgl
.(Optional but highly recommended.) Restrict
.Call
etc to use the symbols you chose to register by editing src/init.c to containvoid R_init_splines(DllInfo *dll) { (dll, NULL, R_CallDef, NULL, NULL); R_registerRoutines(dll, FALSE); R_useDynamicSymbols}
A skeleton for the steps so far can be made using package_native_routine_registration_skeleton
in package tools. This will optionally create declarations based on the usage in the R code.
The remaining steps are optional but recommended.
Edit the NAMESPACE file to create R objects for the registered symbols:
r useDynLib(splines, .registration = TRUE, .fixes = "C_")
Find all the relevant calls in the R code and edit them to use the R objects. This entailed changing the lines
<- .Call("spline_basis", knots, ord, x, derivs, PACKAGE = "splines") temp <- .Call("spline_value", knots, coeff, ord, x[accept], deriv, PACKAGE = "splines") y[accept] = .Call("spline_value", knots, coef(object), ord, x, deriv, PACKAGE = "splines") y
to
<- .Call(C_spline_basis, knots, ord, x, derivs) temp <- .Call(C_spline_value, knots, coeff, ord, x[accept], deriv) y[accept] = .Call(C_spline_value, knots, coef(object), ord, x, deriv) y
Check that there is no
exportPattern
directive which unintentionally exports the newly created R objects.Restrict
.Call
to use the R symbols by editing src/init.c to containvoid R_init_splines(DllInfo *dll) { (dll, NULL, R_CallDef, NULL, NULL); R_registerRoutines(dll, FALSE); R_useDynamicSymbols(dll, TRUE); R_forceSymbols}
Consider visibility. On some OSes we can hide entry points from the loader, which precludes any possible name clashes and calling them accidentally (usually with incorrect arguments and crashing the R process). If we repeat the first step we now see
-g /path/to/splines.so | grep " T " nm 0000000000002e00 T _R_init_splines 00000000000025e0 T _spline_basis 0000000000001e20 T _spline_value
If there were any entry points not intended to be used by the package we should try to avoid exporting them, for example by making them
static
. Now that the two relevant entry points are only accessed via the registration table, we can hide them. There are two ways to do so on some152 Unix-alikes. We can hide individual entry points via#include <R_ext/Visibility.h> SEXP attribute_hidden(SEXP knots, SEXP order, SEXP xvals, SEXP derivs) spline_basis... SEXP attribute_hidden(SEXP knots, SEXP coeff, SEXP order, SEXP x, SEXP deriv) spline_value...
Alternatively, we can change the default visibility for all C symbols by including
= $(C_VISIBILITY) PKG_CFLAGS
in src/Makevars, and then we need to allow registration by declaring
R_init_splines
to be visible:#include <R_ext/Visibility.h> void attribute_visible (DllInfo *dll) R_init_splines...
See Controlling visibility for more details, including using Fortran code and ways to restrict visibility on Windows.
We end up with a file src/init.c containing
#include <stdlib.h> #include <R_ext/Rdynload.h> #include <R_ext/Visibility.h> // optional #include "splines.h" #define CALLDEF(name, n) {#name, (DL_FUNC) &name, n} static const R_CallMethodDef R_CallDef[] = { (spline_basis, 4), CALLDEF(spline_value, 5), CALLDEF{NULL, NULL, 0} }; void // optional attribute_visible (DllInfo *dll) R_init_splines{ (dll, NULL, R_CallDef, NULL, NULL); R_registerRoutines(dll, FALSE); R_useDynamicSymbols(dll, TRUE); R_forceSymbols}
5.4.3 Linking to native routines in other packages ¶
In addition to registering C routines to be called by R, it can at times be useful for one package to make some of its C routines available to be called by C code in another package. The interface consists of two routines declared in header R_ext/Rdynload.h as
void R_RegisterCCallable(const char *package, const char *name,
);
DL_FUNC fptr(const char *package, const char *name); DL_FUNC R_GetCCallable
A package packA that wants to make a C routine myCfun
available to C code in other packages would include the call
R_RegisterCCallable("packA", "myCfun", myCfun);
in its initialization function R_init_packA
. A package packB that wants to use this routine would retrieve the function pointer with a call of the form
= R_GetCCallable("packA", "myCfun"); p_myCfun
As the type DL_FUNC
is only appropriate for functions with no arguments, other users will need to cast to an appropriate type. For example
typedef SEXP (*na_omit_xts_func) (SEXP x);
...
= (na_omit_xts_func) R_GetCCallable("xts", "na_omit_xts");
na_omit_xts_func fun return fun(x);
The author of packB is responsible for ensuring that p_myCfun
has an appropriate declaration. In the future R may provide some automated tools to simplify exporting larger numbers of routines.
A package that wishes to make use of header files in other packages needs to declare them as a comma-separated list in the field LinkingTo’ in the DESCRIPTION file. This then arranges for the include directories in the installed linked-to packages to be added to the include paths for C and C++ code.
It must specify153 Imports’ or Depends’ of those packages, for they have to be loaded154 prior to this one (so the path to their compiled code has been registered).
CRAN examples of the use of this mechanism include coxme linking to bdsmatrix and xts linking to zoo.
NB: this mechanism is fragile, as changes to the interface provided by packA have to be recognised by packB. The consequences of not doing so have included serious corruption to the memory pool of the R session. Either packB has to depend on the exact version of packA or there needs to be a mechanism for packB to test at runtime the version of packA it is linked to matches that it was compiled against.
On rare occasions in can be useful for C code in one package to dynamically look up the address in another package. This can be done using R_FindSymbol
:
R_FindSymbol(char const *name, char const *pkg,
DL_FUNC *symbol); R_RegisteredNativeSymbol
5.6 Interfacing C++ code ¶
Suppose we have the following hypothetical C++ library, consisting of the two files X.h and X.cpp, and implementing the two classes X
and Y
which we want to use in R.
// X.h class X {: X (); ~X (); public }; class Y {: Y (); ~Y (); public };
// X.cpp #include <R.h> #include "X.h" static Y y; ::X() { REprintf("constructor X\n"); } X::~X() { REprintf("destructor X\n"); } X::Y() { REprintf("constructor Y\n"); } Y::~Y() { REprintf("destructor Y\n"); } Y
To use with R, the only thing we have to do is writing a wrapper function and ensuring that the function is enclosed in
"C" {
extern
}
For example,
// X_main.cpp: #include "X.h" extern "C" { void X_main () { ; X x} } // extern "C"
Compiling and linking should be done with the C++ compiler-linker (rather than the C compiler-linker or the linker itself); otherwise, the C++ initialization code (and hence the constructor of the static variable Y
) are not called. On a properly configured system, one can simply use
R CMD SHLIB X.cpp X_main.cpp
to create the shared object, typically X.so (the file name extension may be different on your platform). Now starting R yields
2.14.1 Patched (2012-01-16 r58124)
R version Copyright (C) 2012 The R Foundation for Statistical Computing
..."q()" to quit R.
Type
> dyn.load(paste("X", .Platform$dynlib.ext, sep = ""))
R
constructor Y> .C("X_main")
R
constructor X
destructor Xlist()
> q()
R/n/c]: y
Save workspace image? [y destructor Y
The R for Windows FAQ (rw-FAQ) contains details of how to compile this example under Windows.
Earlier versions of this example used C++ iostreams: this is best avoided. There is no guarantee that the output will appear in the R console, and indeed it will not on the R for Windows console. Use R code or the C entry points (see Printing) for all I/O if at all possible. Examples have been seen where merely loading a DLL that contained calls to C++ I/O upset R’s own C I/O (for example by resetting buffers on open files).
Most R header files can be included within C++ programs but they should not be included within an extern "C"
block (as they include system headers155).
5.6.1 External C++ code ¶
Quite a lot of external C++ software is header-only (e.g. most of the Boost ‘libraries’ including all those supplied by package BH, and most of Armadillo as supplied by package RcppArmadillo) and so is compiled when an R package which uses it is installed. This causes few problems.
A small number of external libraries used in R packages have a C++ interface to a library of compiled code, e.g. packages sf and rjags. This raises many more problems! The C++ interface uses name-mangling and the ABI156 may depend on the compiler, version and even C++ defines157, so requires the package C++ code to be compiled in exactly the same way as the library (and what that was is often undocumented).
Even fewer external libraries use C++ internally but present a C interface, such as GEOS used by sf and other packages. These require the C++ runtime library to be linked into the package’s shared object/DLL, and this is best done by including a dummy C++ file in the package sources.
There is a trend to link to the C++ interfaces offered by C software such as hdf5, pcre and ImageMagick. Their C interfaces are much preferred for portability (and can be used from C++ code). Also, the C++ interfaces are often optional in the software build or packaged separately and so users installing from package sources are less likely to already have them installed.
5.7 Fortran I/O ¶
We have already warned against the use of C++ iostreams not least because output is not guaranteed to appear on the R console, and this warning applies equally to Fortran output to units *
and 6
. See Printing from Fortran, which describes workarounds.
When R was first developed, most Fortran compilers implemented I/O on top of the C I/O system and so the two interworked successfully. This was true of g77
, but no longer of gfortran
as used in gcc
4 and later. In particular, any package that makes use of Fortran I/O will when compiled on Windows interfere with C I/O: when the Fortran I/O support code is initialized (typically when the package is loaded) the C stdout
and stderr
are switched to LF line endings. (Function init
in file src/modules/lapack/init_win.c shows how to mitigate this. In a package this would look something like
#ifdef _WIN32
# include <fcntl.h>
#endif
void R_init_mypkgname(DllInfo *dll)
{
// Native symbol registration calls
#ifdef _WIN32
// gfortran I/O initialization sets these to _O_BINARY
(1, _O_TEXT); /* stdout */
setmode(2, _O_TEXT); /* stderr */
setmode#endif
}
in the file used for native symbol registration.)
5.8 Linking to other packages ¶
It is not in general possible to link a DLL in package packA to a DLL provided by package packB (for the security reasons mentioned in dyn.load
and dyn.unload
, and also because some platforms distinguish between shared objects and dynamic libraries), but it is on Windows.
Note that there can be tricky versioning issues here, as package packB could be re-installed after package packA — it is desirable that the API provided by package packB remains backwards-compatible.
Shipping a static library in package packB for other packages to link to avoids most of the difficulties.
5.8.1 Unix-alikes ¶
It is possible to link a shared object in package packA to a library provided by package packB under limited circumstances on a Unix-alike OS. There are severe portability issues, so this is not recommended for a distributed package.
This is easiest if packB provides a static library packB/lib/libpackB.a. (Note using directory lib rather than libs is conventional, and architecture-specific sub-directories may be needed and are assumed in the sample code below. The code in the static library will need to be compiled with PIC
flags on platforms where it matters.) Then as the code from package packB is incorporated when package packA is installed, we only need to find the static library at install time for package packA. The only issue is to find package packB, and for that we can ask R by something like (long lines broken for display here)
=`echo 'library(packB);
PKGB_PATH(system.file("lib", package="packB", mustWork=TRUE))' \
cat| "${R_HOME}/bin/R" --vanilla --no-echo`
="$(PKGB_PATH)$(R_ARCH)/libpackB.a" PKG_LIBS
For a dynamic library packB/lib/libpackB.so (packB/lib/libpackB.dylib on macOS: note that you cannot link to a shared object, .so, on that platform) we could use
=`echo 'library(packB);
PKGB_PATH(system.file("lib", package="packB", mustWork=TRUE))' \
cat| "${R_HOME}/bin/R" --vanilla --no-echo`
=-L"$(PKGB_PATH)$(R_ARCH)" -lpackB PKG_LIBS
This will work for installation, but very likely not when package packB
is loaded, as the path to package packB’s lib directory is not in the ld.so
158 search path. You can arrange to put it there before R is launched by setting (on some platforms) LD_RUN_PATH
or LD_LIBRARY_PATH
or adding to the ld.so
cache (see man ldconfig
). On platforms that support it, the path to the directory containing the dynamic library can be hardcoded at install time (which assumes that the location of package packB will not be changed nor the package updated to a changed API). On systems with the gcc
or clang
and the GNU linker (e.g. Linux) and some others this can be done by e.g.
=`echo 'library(packB);
PKGB_PATH(system.file("lib", package="packB", mustWork=TRUE)))' \
cat| "${R_HOME}/bin/R" --vanilla --no-echo`
=-L"$(PKGB_PATH)$(R_ARCH)" -Wl,-rpath,"$(PKGB_PATH)$(R_ARCH)" -lpackB PKG_LIBS
Some other systems (e.g. Solaris with its native linker) use -Rdir rather than -rpath,dir (and this is accepted by the compiler as well as the linker).
It may be possible to figure out what is required semi-automatically from the result of R CMD libtool --config
(look for hardcode’).
Making headers provided by package packB available to the code to be compiled in package packA can be done by the LinkingTo
mechanism (see Registering native routines).
5.8.2 Windows ¶
Suppose package packA wants to make use of compiled code provided by packB in DLL packB/libs/exB.dll, possibly the package’s DLL packB/libs/packB.dll. (This can be extended to linking to more than one package in a similar way.) There are three issues to be addressed:
Making headers provided by package packB available to the code to be compiled in package packA.
This is done by the
LinkingTo
mechanism (see Registering native routines).preparing
packA.dll
to link to packB/libs/exB.dll.This needs an entry in Makevars.win or Makevars.ucrt of the form
= -L<something> -lexB PKG_LIBS
and one possibility is that
<something>
is the path to the installed pkgB/libs directory. To find that we need to ask R where it is by something like=`echo 'library(packB); PKGB_PATH(system.file("libs", package="packB", mustWork=TRUE))' \ cat| rterm --vanilla --no-echo` = -L"$(PKGB_PATH)$(R_ARCH)" -lexB PKG_LIBS
Another possibility is to use an import library, shipping with package packA an exports file exB.def. Then Makevars.win (or Makevars.ucrt) could contain
= -L. -lexB PKG_LIBS : $(SHLIB) before all : libexB.dll.a before.dll.a: exB.def libexB
and then installing package packA will make and use the import library for exB.dll. (One way to prepare the exports file is to use pexports.exe.)
loading packA.dll which depends on exB.dll.
If
exB.dll
was used by package packB (because it is in fact packB.dll or packB.dll depends on it) and packB has been loaded before packA, then nothing more needs to be done as exB.dll will already be loaded into the R executable. (This is the most common scenario.)More generally, we can use the
DLLpath
argument tolibrary.dynam
to ensure thatexB.dll
is found, for example by settinglibrary.dynam("packA", pkg, lib, DLLpath = system.file("libs", package="packB"))
Note that
DLLpath
can only set one path, and so for linking to two or more packages you would need to resort to setting environment variablePATH
.
5.9 Handling R objects in C ¶
Using C code to speed up the execution of an R function is often very fruitful. Traditionally this has been done via the .C
function in R. However, if a user wants to write C code using internal R data structures, then that can be done using the .Call
and .External
functions. The syntax for the calling function in R in each case is similar to that of .C
, but the two functions have different C interfaces. Generally the .Call
interface is simpler to use, but .External
is a little more general.
A call to .Call
is very similar to .C
, for example
.Call("convolve2", a, b)
The first argument should be a character string giving a C symbol name of code that has already been loaded into R. Up to 65 R objects can passed as arguments. The C side of the interface is
#include <R.h>
#include <Rinternals.h>
(SEXP a, SEXP b)
SEXP convolve2...
A call to .External
is almost identical
.External("convolveE", a, b)
but the C side of the interface is different, having only one argument
#include <R.h>
#include <Rinternals.h>
(SEXP args)
SEXP convolveE...
Here args
is a LISTSXP
, a Lisp-style pairlist from which the arguments can be extracted.
In each case the R objects are available for manipulation via a set of functions and macros defined in the header file Rinternals.h or some S-compatibility macros159 See Interface functions .Call
and .External
for details on .Call
and .External
.
Before you decide to use .Call
or .External
, you should look at other alternatives. First, consider working in interpreted R code; if this is fast enough, this is normally the best option. You should also see if using .C
is enough. If the task to be performed in C is simple enough involving only atomic vectors and requiring no call to R, .C
suffices. A great deal of useful code was written using just .C
before .Call
and .External
were available. These interfaces allow much more control, but they also impose much greater responsibilities so need to be used with care. Neither .Call
nor .External
copy their arguments: you should treat arguments you receive through these interfaces as read-only.
To handle R objects from within C code we use the macros and functions that have been used to implement the core parts of R. A public160 subset of these is defined in the header file Rinternals.h in the directory R_INCLUDE_DIR (default R_HOME/include) that should be available on any R installation.
A substantial amount of R, including the standard packages, is implemented using the functions and macros described here, so the R source code provides a rich source of examples and “how to do it”: do make use of the source code for inspirational examples.
It is necessary to know something about how R objects are handled in C code. All the R objects you will deal with will be handled with the type SEXP161, which is a pointer to a structure with typedef SEXPREC
. Think of this structure as a variant type that can handle all the usual types of R objects, that is vectors of various modes, functions, environments, language objects and so on. The details are given later in this section and in R Internal Structures in R Internals, but for most purposes the programmer does not need to know them. Think rather of a model such as that used by Visual Basic, in which R objects are handed around in C code (as they are in interpreted R code) as the variant type, and the appropriate part is extracted for, for example, numerical calculations, only when it is needed. As in interpreted R code, much use is made of coercion to force the variant object to the right type.
5.9.1 Handling the effects of garbage collection ¶
We need to know a little about the way R handles memory allocation. The memory allocated for R objects is not freed by the user; instead, the memory is from time to time garbage collected. That is, some or all of the allocated memory not being used is freed or marked as re-usable.
The R object types are represented by a C structure defined by a typedef SEXPREC
in Rinternals.h. It contains several things among which are pointers to data blocks and to other SEXPREC
s. A SEXP
is simply a pointer to a SEXPREC
.
If you create an R object in your C code, you must tell R that you are using the object by using the PROTECT
macro on a pointer to the object. This tells R that the object is in use so it is not destroyed during garbage collection. Notice that it is the object which is protected, not the pointer variable. It is a common mistake to believe that if you invoked PROTECT(
p)
at some point then p is protected from then on, but that is not true once a new object is assigned to p.
Protecting an R object automatically protects all the R objects pointed to in the corresponding SEXPREC
, for example all elements of a protected list are automatically protected.
The programmer is solely responsible for housekeeping the calls to PROTECT
. There is a corresponding macro UNPROTECT
that takes as argument an int
giving the number of objects to unprotect when they are no longer needed. The protection mechanism is stack-based, so UNPROTECT(
n)
unprotects the last n objects which were protected. The calls to PROTECT
and UNPROTECT
must balance when the user’s code returns and should balance in all functions. R will warn about "stack imbalance in .Call"
(or .External
) if the housekeeping is wrong.
Here is a small example of creating an R numeric vector in C code:
#include <R.h>
#include <Rinternals.h>
;
SEXP ab....
= PROTECT(RF_allocVector(REALSXP, 2));
ab (ab)[0] = 123.45;
REAL(ab)[1] = 67.89;
REAL(1); UNPROTECT
Now, the reader may ask how the R object could possibly get removed during those manipulations, as it is just our C code that is running. As it happens, we can do without the protection in this example, but in general we do not know (nor want to know) what is hiding behind the R macros and functions we use, and any of them might cause memory to be allocated, hence garbage collection and hence our object ab
to be removed. It is usually wise to err on the side of caution and assume that any of the R macros and functions might remove the object.
In some cases it is necessary to keep better track of whether protection is really needed. Be particularly aware of situations where a large number of objects are generated. The pointer protection stack has a fixed size (default 10,000) and can become full. It is not a good idea then to just PROTECT
everything in sight and UNPROTECT
several thousand objects at the end. It will almost invariably be possible to either assign the objects as part of another object (which automatically protects them) or unprotect them immediately after use.
There is a less-used macro UNPROTECT_PTR(
s)
that unprotects the object pointed to by the SEXP
s, even if it is not the top item on the pointer protection stack. This macro was introduced for use in the parser, where the code interfacing with the R heap is generated and the generator cannot be configured to insert proper calls to PROTECT
and UNPROTECT
. However, UNPROTECT_PTR
is dangerous to use in combination with UNPROTECT
when the same object has been protected multiple times. It has been superseded by multi-set based functions R_PreserveInMSet
and R_ReleaseFromMSet
, which protect objects in a multi-set created by R_NewPreciousMSet
and typically itself protected using PROTECT
. These functions should not be needed outside parsers.
Sometimes an object is changed (for example duplicated, coerced or grown) yet the current value needs to be protected. For these cases PROTECT_WITH_INDEX
saves an index of the protection location that can be used to replace the protected value using REPROTECT
.
For example (from the internal code for optim
)
PROTECT_INDEX ipx;
....PROTECT_WITH_INDEX(s = Rf_eval(OS->R_fcall, OS->R_env), &ipx);
REPROTECT(s = Rf_coerceVector(s, REALSXP), ipx);
Note that it is dangerous to mix UNPROTECT_PTR
also with PROTECT_WITH_INDEX
, as the former changes the protection locations of objects that were protected after the one being unprotected.
There is another way to avoid the effects of garbage collection: a call to R_PreserveObject
adds an object to an internal list of objects not to be collected, and a subsequent call to R_ReleaseObject
removes it from that list. This provides a way for objects which are not returned as part of R objects to be protected across calls to compiled code: on the other hand it becomes the user’s responsibility to release them when they are no longer needed (and this often requires the use of a finalizer). It is less efficient than the normal protection mechanism, and should be used sparingly.
For functions from packages as well as R to safely co-operate in protecting objects, certain rules have to be followed:
- Pointer-protection balance. Calls to
PROTECT
andUNPROTECT
should balance in each function. A function may only callUNPROTECT
orREPROTECT
on objects it has itself protected. Note that the pointer protection stack balance is restored automatically on non-local transfer of control (See Condition handling and cleanup code.), as if a call toUNPROTECT
was invoked with the right argument. - Caller protection. It is the responsibility of the caller that all arguments passed to a function are protected and will stay protected for the whole execution of the callee. Typically this is achieved by
PROTECT
andUNPROTECT
calls. - Protecting return values. Any R objects returned from a function are unprotected (the callee must maintain pointer-protection balance), and hence should be protected immediately by the caller. To be safe against future code changes, assume that any R object returned from any function may need protection. Note that even when conceptually returning an existing protected object, that object may be duplicated.
- All functions/macros allocate. To be safe against future code changes, assume that any function or macro may allocate and hence garbage collector may run and destroy unprotected objects.
It is always safe and recommended to follow those rules. In fact, several R functions and macros protect their own arguments and some functions do not allocate or do not allocate when used in a certain way, but that is subject to change, so relying on that may be fragile. PROTECT
and PROTECT_WITH_INDEX
can be safely called with unprotected arguments and UNPROTECT
does not allocate.
5.9.2 Allocating storage ¶
For many purposes it is sufficient to allocate R objects and manipulate those. There are quite a few Rf_alloc
Xxx functions defined in Rinternals.h—you may want to explore them.
One that is commonly used is Rf_allocVector
, the C-level equivalent of R-level vector()
and its wrappers such as integer()
and character()
. One distinction is that whereas the R functions always initialize the elements of the vector, Rf_allocVector
only does so for lists, expressions and character vectors (the cases where the elements are themselves R objects). Other useful allocation functions are Rf_alloc3DArray
, Rf_allocArray
, and Rf_allocMatrix
.
At times it can be useful to allocate a larger initial result vector and resize it to a shorter length if that is sufficient. The functions Rf_lengthgets
and Rf_xlengthgets
accomplish this; they are analogous to using length(x) <- n
in R. Typically these functions return a freshly allocated object, but in some cases they may re-use the supplied object.
When creating new result objects it can be useful to fill them in with values from an existing object. The functions Rf_copyVector
and Rf_copyMatrix
can be used for this. Rf_copyMostAttributes
can also simplify setting up a result object; it is used internally for results of arithmetic operations.
If storage is required for C objects during the calculations this is best allocated by calling R_alloc
; see Memory allocation. All of these memory allocation routines do their own error-checking, so the programmer may assume that they will raise an error and not return if the memory cannot be allocated.
5.9.3 Details of R types ¶
Users of the Rinternals.h macros will need to know how the R types are known internally. The different R data types are represented in C by SEXPTYPE. Some of these are familiar from R and some are internal data types. The usual R object modes are given in the table.
SEXPTYPE R equivalent REALSXP
numeric with storage mode double
INTSXP
integer CPLXSXP
complex LGLSXP
logical STRSXP
character VECSXP
list (generic vector) LISTSXP
pairlist DOTSXP
a ...’ object NILSXP
NULL SYMSXP
name/symbol CLOSXP
function or function closure ENVSXP
environment
Among the important internal SEXPTYPE
s are LANGSXP
, CHARSXP
, PROMSXP
, etc. (N.B.: although it is possible to return objects of internal types, it is unsafe to do so as assumptions are made about how they are handled which may be violated at user-level evaluation.) More details are given in R Internal Structures in R Internals.
Unless you are very sure about the type of the arguments, the code should check the data types. Sometimes it may also be necessary to check data types of objects created by evaluating an R expression in the C code. You can use functions like Rf_isReal
, Rf_isInteger
and Rf_isString
to do type checking. Other such functions declared in the header file Rinternals.h include Rf_iisNull
, Rf_iisSymbol
, Rf_iisLogical
, Rf_iisComplex
, Rf_iisExpression
, and Rf_iisEnvironment
.
All of these take a SEXP
as argument and return 1 or 0 to indicate TRUE or FALSE.
What happens if the SEXP
is not of the correct type? Sometimes you have no other option except to generate an error. You can use the function Rf_error
for this. It is usually better to coerce the object to the correct type. For example, if you find that an SEXP
is of the type INTEGER
, but you need a REAL
object, you can change the type by using
= PROTECT(Rf_coerceVector(oldSexp, REALSXP)); newSexp
Protection is needed as a new object is created; the object formerly pointed to by the SEXP
is still protected but now unused.162
All the coercion functions do their own error-checking, and generate NA
s with a warning or stop with an error as appropriate.
Note that these coercion functions are not the same as calling as.numeric
(and so on) in R code, as they do not dispatch on the class of the object. Thus it is normally preferable to do the coercion in the calling R code.
So far we have only seen how to create and coerce R objects from C code, and how to extract the numeric data from numeric R vectors. These can suffice to take us a long way in interfacing R objects to numerical algorithms, but we may need to know a little more to create useful return objects.
5.9.4 Attributes ¶
Many R objects have attributes: some of the most useful are classes and the dim
and dimnames
that mark objects as matrices or arrays. It can also be helpful to work with the names
attribute of vectors.
To illustrate this, let us write code to take the outer product of two vectors (which outer
and %o%
already do). As usual the R code is simple
<- function(x, y)
out
{storage.mode(x) <- storage.mode(y) <- "double"
.Call("out", x, y)
}
where we expect x
and y
to be numeric vectors (possibly integer), possibly with names. This time we do the coercion in the calling R code.
C code to do the computations is
#include <R.h>
#include <Rinternals.h>
(SEXP x, SEXP y)
SEXP out{
int nx = Rf_length(x), ny = Rf_length(y);
= PROTECT(Rf_allocMatrix(REALSXP, nx, ny));
SEXP ans double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
[i + nx*j] = tmp * ry[j];
rans}
(1);
UNPROTECTreturn ans;
}
Note the way REAL
is used: as it is a function call it can be considerably faster to store the result and index that.
However, we would like to set the dimnames
of the result. We can use
#include <R.h>
#include <Rinternals.h>
(SEXP x, SEXP y)
SEXP out{
int nx = Rf_length(x), ny = Rf_length(y);
= PROTECT(Rf_allocMatrix(REALSXP, nx, ny));
SEXP ans double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
[i + nx*j] = tmp * ry[j];
rans}
= PROTECT(Rf_allocVector(VECSXP, 2));
SEXP dimnames (dimnames, 0, Rf_getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, Rf_getAttrib(y, R_NamesSymbol));
SET_VECTOR_ELT(ans, R_DimNamesSymbol, dimnames);
Rf_setAttrib
(2);
UNPROTECTreturn ans;
}
This example introduces several new features. The Rf_getAttrib
and Rf_setAttrib
functions get and set individual attributes. Their second argument is a SEXP
defining the name in the symbol table of the attribute we want; these and many such symbols are defined in the header file Rinternals.h.
There are shortcuts here too: the functions namesgets
, dimgets
and dimnamesgets
are the internal versions of the default methods of names<-
, dim<-
and dimnames<-
(for vectors and arrays), and there are functions such as Rf_GetColNames
, Rf_GetRowNames
, Rf_GetMatrixDimnames
and Rf_GetArrayDimnames
.
What happens if we want to add an attribute that is not pre-defined? We need to add a symbol for it via a call to Rf_install
. Suppose for illustration we wanted to add an attribute "version"
with value 3.0
. We could use
;
SEXP version= PROTECT(Rf_allocVector(REALSXP, 1));
version (version)[0] = 3.0;
REAL(ans, Rf_install("version"), version);
Rf_setAttrib(1); UNPROTECT
Using Rf_install
when it is not needed is harmless and provides a simple way to retrieve the symbol from the symbol table if it is already installed. However, the lookup takes a non-trivial amount of time, so consider code such as
static SEXP VerSymbol = NULL;
...
if (VerSymbol == NULL) VerSymbol = Rf_install("version");
if it is to be done frequently.
This example can be simplified by another convenience function:
= PROTECT(Rf_ScalarReal(3.0));
SEXP version (ans, Rf_install("version"), version);
Rf_setAttrib(1); UNPROTECT
If a result is to be a vector with all elements named, then Rf_mkNamed
can be used to allocate a vector of a specified type. Names are provided as a C vector of strings terminated by an empty string:
const char *nms[] = {"xi", "yi", "zi", ""};
(VECSXP, nms); Rf_mkNamed
Symbols can also be installed or retrieved based on a name in a CHARSXP
object using either Rf_installChar
or Rf_installTrChar
. These used to differ in handling character encoding but have been identical since R 4.0.0.
5.9.5 Classes ¶
In R the class is just the attribute named "class"
so it can be handled as such, but there is a shortcut Rf_classgets
. Suppose we want to give the return value in our example the class "mat"
. We can use
#include <R.h>
#include <Rinternals.h>
....
, dim, dimnames, class;
SEXP ans....
= PROTECT(Rf_allocVector(STRSXP, 1));
class (class, 0, Rf_mkChar("mat"));
SET_STRING_ELT(ans, class);
Rf_classgets(4);
UNPROTECTreturn ans;
}
As the value is a character vector, we have to know how to create that from a C character array, which we do using the function mkChar
.
5.9.6 S4 objects ¶
Several functions are available for working with S4 objects and classes in C, including:
(void);
SEXP Rf_allocS4Object(SEXP, Rboolean, int);
SEXP Rf_asS4int R_check_class_etc(SEXP x, const char **valid);
(const char *what);
SEXP R_do_MAKE_CLASS(SEXP class_def);
SEXP R_do_new_object(SEXP obj, SEXP name);
SEXP R_do_slot(SEXP obj, SEXP name, SEXP value);
SEXP R_do_slot_assign(const char *what);
SEXP R_getClassDef int R_has_slot(SEXP obj, SEXP name);
5.9.7 Handling lists ¶
Some care is needed with lists, as R moved early on from using LISP-like lists (now called “pairlists”) to S-like generic vectors. As a result, the appropriate test for an object of mode list
is Rf_isNewList
, and we need Rf_allocVector(VECSXP,
n) and not Rf_allocList(
n)
.
List elements can be retrieved or set by direct access to the elements of the generic vector. Suppose we have a list object
<- list(f = 1, g = 2, h = 3) a
Then we can access a$g
as a[[2]]
by
double g;
....
= REAL(VECTOR_ELT(a, 1))[0]; g
This can rapidly become tedious, and the following function (based on one in package stats) is very useful:
/* get the list element named str (ASCII), or return NULL */
(SEXP list, const char *str)
SEXP getListElement{
= R_NilValue, names = Rf_getAttrib(list, R_NamesSymbol);
SEXP elmt
for (int i = 0; i < Rf_length(list); i++)
if(strcmp(CHAR(STRING_ELT(names, i)), str) == 0) {
/* ASCII only */
= VECTOR_ELT(list, i);
elmt break;
}
return elmt;
}
and enables us to say
double g;
= REAL(getListElement(a, "g"))[0]; g
This code only works for names that are ASCII (see Character encoding issues).
5.9.8 Handling character data ¶
R character vectors are stored as STRSXP
s, a vector type like VECSXP
where every element is of type CHARSXP
. The CHARSXP
elements of STRSXP
s are accessed using STRING_ELT
and SET_STRING_ELT
.
CHARSXP
s are read-only objects and must never be modified. In particular, the C-style string contained in a CHARSXP
should be treated as read-only and for this reason the CHAR
function used to access the character data of a CHARSXP
returns (const char *)
(this also allows compilers to issue warnings about improper use). Since CHARSXP
s are immutable, the same CHARSXP
can be shared by any STRSXP
needing an element representing the same string. R maintains a global cache of CHARSXP
s so that there is only ever one CHARSXP
representing a given string in memory. It most cases it is easier to use Rf_translateChar
or Rf_translateCharUTF8
to obtain the C string and it is safer against potential future changes in R (see Character encoding issues).
You can obtain a CHARSXP
by calling Rf_mkChar
and providing a NUL-terminated C-style string. This function will return a pre-existing CHARSXP
if one with a matching string already exists, otherwise it will create a new one and add it to the cache before returning it to you. The variant Rf_mkCharLen
can be used to create a CHARSXP
from part of a buffer and will ensure null-termination.
Note that R character strings are restricted to 2^31 - 1
bytes, and hence so should the input to Rf_mkChar
be (C allows longer strings on 64-bit platforms).
5.9.9 Working with closures ¶
New function closure objects can be created with R_mkClosure
:
(SEXP formals, SEXP body, SEXP rho); SEXP R_mkClosure
The components of a closure can be extracted with R_ClosureFormals
, R_ClosureBody
, and R_ClosureEnv
. For a byte compiled closure R_ClosureBody
returns the compiled body. R_ClosureExpr
returns the body expression for both compiled and uncompiled closures. The expression for a compiled object can be obtained with R_BytecodeExpr
.
5.9.10 Finding and setting variables ¶
It will be usual that all the R objects needed in our C computations are passed as arguments to .Call
or .External
, but it is possible to find the values of R objects from within the C given their names. The following code is the equivalent of get(name, envir = rho)
.
(SEXP name, SEXP rho)
SEXP getvar{
;
SEXP ans
if (!Rf_isString(name) || Rf_length(name) != 1)
("name is not a single string");
Rf_errorif (!Rf_isEnvironment(rho))
("rho should be an environment");
Rf_error= R_getVar(Rf_installChar(STRING_ELT(name, 0)), rho, TRUE);
ans if (TYPEOF(ans) != REALSXP || Rf_length(ans) == 0)
("value is not a numeric vector with at least one element");
Rf_error("first value is %f\n", REAL(ans)[0]);
Rprintfreturn R_NilValue;
}
The main work is done by R_getVar
, but to use it we need to install name
as a name in the symbol table. As we wanted the value for internal use, we return NULL
.
R_getVar
is similar to the R function get
. It signals an error if there is no binding for the variable in the environment. R_getVarEx
can be used to return a default value if no binding is found; this corresponds to the R function get0
. The third argument to R_getVar
and R_getVarEx
corresponds to the inherits
argument to the R function get
.
Functions with syntax
void Rf_defineVar(SEXP symbol, SEXP value, SEXP rho)
void Rf_setVar(SEXP symbol, SEXP value, SEXP rho)
can be used to assign values to R variables. defineVar
creates a new binding or changes the value of an existing binding in the specified environment frame; it is the analogue of assign(symbol, value, envir = rho, inherits = FALSE)
, but unlike assign
, defineVar
does not make a copy of the object value
.163 setVar
searches for an existing binding for symbol
in rho
or its enclosing environments. If a binding is found, its value is changed to value
. Otherwise, a new binding with the specified value is created in the global environment. This corresponds to assign(symbol, value, envir = rho, inherits = TRUE)
.
At times it may also be useful to create a new environment frame in C code. R_NewEnv
is a C version of the R function new.env
:
(SEXP enclos, int hash, int size) SEXP R_NewEnv
5.9.11 Some convenience functions ¶
Some operations are done so frequently that there are convenience functions to handle them. (All these are provided via the header file Rinternals.h.)
Suppose we wanted to pass a single logical argument ignore_quotes
: we could use
int ign = Rf_asLogical(ignore_quotes);
if(ign == NA_LOGICAL) Rf_error("'ignore_quotes' must be TRUE or FALSE");
which will do any coercion needed (at least from a vector argument), and return NA_LOGICAL
if the value passed was NA
or coercion failed. There are also Rf_asInteger
, Rf_asReal
and Rf_asComplex
. The function Rf_asChar
returns a CHARSXP
. All of these functions ignore any elements of an input vector after the first. The function Rf_asCharacterFactor
converts a factor to a character vector.
To return a length-one real vector we can use
double x;
...
return Rf_ScalarReal(x);
and there are versions of this for all the atomic vector types (those for a length-one character vector being Rf_ScalarString
with argument a CHARSXP
and Rf_mkString
with argument const char *
).
(double);
SEXP Rf_ScalarReal(int);
SEXP Rf_ScalarInteger(int)
SEXP Rf_ScalarLogical(Rbyte);
SEXP Rf_ScalarRaw(Rcomplex);
SEXP Rf_ScalarComplex(SEXP);
SEXP Rf_ScalarString(const char *); SEXP Rf_mkString
Some of the Rf_is
XXXX functions differ from their apparent R-level counterparts: for example Rf_isVector
is true for any atomic vector type (Rf_isVectorAtomic
) and for lists and expressions (Rf_isVectorList
) (with no check on attributes). Rf_isMatrix
is a test of a length-2 "dim"
attribute.
(SEXP);
Rboolean Rf_isVector(SEXP);
Rboolean Rf_isVectorAtomic(SEXP);
Rboolean Rf_isVectorList(SEXP);
Rboolean Rf_isMatrix(SEXP);
Rboolean Rf_isPairList(SEXP);
Rboolean Rf_isPrimitive(SEXP);
Rboolean Rf_isTs(SEXP);
Rboolean Rf_isNumeric(SEXP);
Rboolean Rf_isArray(SEXP);
Rboolean Rf_isFactor(SEXP);
Rboolean Rf_isObject(SEXP);
Rboolean Rf_isFunction(SEXP);
Rboolean Rf_isLanguage(SEXP);
Rboolean Rf_isNewList(SEXP);
Rboolean Rf_isList(SEXP);
Rboolean Rf_isOrdered(SEXP);
Rboolean Rf_isUnordered(SEXP);
Rboolean Rf_isS4(SEXP);
Rboolean Rf_isNumber(SEXP); Rboolean Rf_isDataFrame
Some additional predicates:
(const char *);
Rboolean Rf_isBlankString(SEXP);
Rboolean Rf_StringBlank(const char *);
Rboolean Rf_StringFalse(const char *);
Rboolean Rf_StringTrueint IS_LONG_VEC(SEXP);
int IS_SCALAR(SEXP, int);
There are a series of small macros/functions to help construct pairlists and language objects (whose internal structures just differ by SEXPTYPE
). Function CONS(u, v)
is the basic building block: it constructs a pairlist from u
followed by v
(which is a pairlist or R_NilValue
). LCONS
is a variant that constructs a language object. Functions Rf_list1
to Rf_list6
construct a pairlist from one to six items, and Rf_lang1
to Rf_lang6
do the same for a language object (a function to call plus zero to five arguments).
Functions Rf_elt
and Rf_lastElt
find the i-th element and the last element of a pairlist, and Rf_nthcdr
returns a pointer to the n-th position in the pairlist (whose CAR
is the n-th item).
Functions Rf_str2type
and Rf_type2str
map R length-one character strings to and from SEXPTYPE
numbers, and Rf_type2char
maps numbers to C character strings. Rf_type2str_nowarn
does not issue a warning if the SEXPTYPE
is invalid.
5.9.12 Semi-internal convenience functions ¶
There is quite a collection of functions that may be used in your C code if you are willing to adapt to rare API changes. These typically contain the “workhorses” of their R counterparts.
Functions Rf_any_duplicated
and Rf_any_duplicated3
are fast versions of R’s any(duplicated(.))
.
Function R_compute_identical
corresponds to R’s identical
function. Function R_BindingIsLocked
corresponds to R’s bindingIsLocked
function. Function R_ParentEnv
corresponds to R’s parent.env
.
The C functions Rf_inherits
and Rf_topenv
correspond to the R functions of the same base name. The C function Rf_GetOption1
corresponds to the R function getOption
without specifying a default.
Rf_GetOptionWidth
returns the value of the width
option as an int
. The C function Rf_nlevels
returns the number of levels of a factor. Unlike its R counterpart it always returns zero for non-factors.
For vectors the C function Rf_duplicated
returns a logical vector indicating for each element whether it is duplicated or not. A second argument specifies the direction of the search.
The C function R_lsInternal3
returns a character vector of the names of variables in an environment. The second and third arguments specify whether all names are desired and whether the result should be sorted.
Some convenience functions for working with pairlist objects include Rf_copyListMatrix
, Rf_listAppend
, Rf_isVectorizable
, Rf_VectorToPairList
, and Rf_PairToVectorList
Some convenience functions for working with name spaces and environments include R_existsVarInFrame
, R_removeVarFromFrame
, R_PackageEnvName
, R_IsPackageEnv
, R_FindNamespace
, R_IsNamespaceEnv
, and R_NamespaceEnvSpec
.
The C functions Rf_match
and Rf_pmatch
correspond to the R functions of the same base name. The C-level workhorse for partial matching is provided by Rf_psmatch
.
The C functions R_forceAndCall
and Rf_isUnsorted
correspond to the R functions forceAndCall
and is.unsorted
.
5.9.13 Named objects and copying ¶
[The NAMED
mechanism has been replaced by reference counting.]
When assignments are done in R such as
<- 1:10
x <- x y
the named object is not necessarily copied, so after those two assignments y
and x
are bound to the same SEXPREC
(the structure a SEXP
points to). This means that any code which alters one of them has to make a copy before modifying the copy if the usual R semantics are to apply. Note that whereas .C
and .Fortran
do copy their arguments, .Call
and .External
do not. So Rf_duplicate
is commonly called on arguments to .Call
before modifying them. If only the top level is modified it may suffice to call Rf_shallow_duplicate
.
At times it may be necessary to copy attributes from one object to another. This can be done using DUPLICATE_ATTRIB
or SHALLOW_DUPLICATE_ATTRIB
ANY_ATTRIB
checks whether there are any attributes and CLEAR_ATTRIB
removes all attributes.
However, at least some of this copying is unneeded. In the first assignment shown, x <- 1:10
, R first creates an object with value 1:10
and then assigns it to x
but if x
is modified no copy is necessary as the temporary object with value 1:10
cannot be referred to again. R distinguishes between named and unnamed objects via a field in a SEXPREC
that can be accessed via the macros NAMED
and SET_NAMED
. This can take values
0
-
The object is not bound to any symbol
1
-
The object has been bound to exactly one symbol
>= 2
-
The object has potentially been bound to two or more symbols, and one should act as if another variable is currently bound to this value. The maximal value is
NAMEDMAX
.
Note the past tenses: R does not do currently do full reference counting and there may currently be fewer bindings.
It is safe to modify the value of any SEXP
for which NAMED(foo)
is zero, and if NAMED(foo)
is two or more, the value should be duplicated (via a call to duplicate
) before any modification. Note that it is the responsibility of the author of the code making the modification to do the duplication, even if it is x
whose value is being modified after y <- x
.
The case NAMED(foo) == 1
allows some optimization, but it can be ignored (and duplication done whenever NAMED(foo) > 0
). (This optimization is not currently usable in user code.) It is intended for use within replacement functions. Suppose we used
<- 1:10
x foo(x) <- 3
which is computed as
<- 1:10
x <- "foo<-"(x, 3) x
Then inside "foo<-"
the object pointing to the current value of x
will have NAMED(foo)
as one, and it would be safe to modify it as the only symbol bound to it is x
and that will be rebound immediately. (Provided the remaining code in "foo<-"
make no reference to x
, and no one is going to attempt a direct call such as y <- "foo<-"(x)
.)
This mechanism was replaced in R 4.0.0. To support future changes, package code should use NO_REFERENCES
, MAYBE_REFERENCED
, NOT_SHARED
, MAYBE_SHARED
, and MARK_NOT_MUTABLE
.
5.10 Interface functions .Call
and .External
¶
In this section we consider the details of the R/C interfaces.
These two interfaces have almost the same functionality. .Call
is based on the interface of the same name in S version 4, and .External
is based on R’s .Internal
. .External
is more complex but allows a variable number of arguments.
5.10.1 Calling .Call
¶
Let us convert our finite convolution example to use .Call
. The calling function in R is
<- function(a, b) .Call("convolve2", a, b) conv
which could hardly be simpler, but as we shall see all the type coercion is transferred to the C code, which is
#include <R.h>
#include <Rinternals.h>
(SEXP a, SEXP b)
SEXP convolve2{
int na, nb, nab;
double *xa, *xb, *xab;
;
SEXP ab
= PROTECT(Rf_coerceVector(a, REALSXP));
a = PROTECT(Rf_coerceVector(b, REALSXP));
b = Rf_length(a); nb = Rf_length(b); nab = na + nb - 1;
na = PROTECT(Rf_allocVector(REALSXP, nab));
ab = REAL(a); xb = REAL(b); xab = REAL(ab);
xa for(int i = 0; i < nab; i++) xab[i] = 0.0;
for(int i = 0; i < na; i++)
for(int j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
(3);
UNPROTECTreturn ab;
}
5.10.2 Calling .External
¶
We can use the same example to illustrate .External
. The R code changes only by replacing .Call
by .External
<- function(a, b) .External("convolveE", a, b) conv
but the main change is how the arguments are passed to the C code, this time as a single SEXP. The only change to the C code is how we handle the arguments.
#include <R.h>
#include <Rinternals.h>
(SEXP args)
SEXP convolveE{
int i, j, na, nb, nab;
double *xa, *xb, *xab;
, b, ab;
SEXP a
= PROTECT(Rf_coerceVector(CADR(args), REALSXP));
a = PROTECT(Rf_coerceVector(CADDR(args), REALSXP));
b ...
}
Once again we do not need to protect the arguments, as in the R side of the interface they are objects that are already in use. The macros
= CADR(args);
first = CADDR(args);
second = CADDDR(args);
third = CAD4R(args);
fourth = CAD5R(args); fifth
provide convenient ways to access the first five arguments. More generally we can use the CDR
and CAR
macros as in
= CDR(args); a = CAR(args);
args = CDR(args); b = CAR(args); args
which clearly allows us to extract an unlimited number of arguments (whereas .Call
has a limit, albeit at 65 not a small one).
More usefully, the .External
interface provides an easy way to handle calls with a variable number of arguments, as length(args)
will give the number of arguments supplied (of which the first is ignored). We may need to know the names (‘tags’) given to the actual arguments, which we can by using the TAG
macro and using something like the following example, that prints the names and the first value of its arguments if they are vector types.
(SEXP args)
SEXP showArgs{
void *vmax = vmaxget();
= CDR(args); /* skip 'name' */
args for(int i = 0; args != R_NilValue; i++, args = CDR(args)) {
const char *name =
(TAG(args)) ? "" : Rf_translateChar(PRINTNAME(TAG(args)));
Rf_isNull= CAR(args);
SEXP el if (length(el) == 0) {
("[%d] '%s' R type, length 0\n", i+1, name);
Rprintfcontinue;
}
switch(TYPEOF(el)) {
case REALSXP:
("[%d] '%s' %f\n", i+1, name, REAL(el)[0]);
Rprintfbreak;
case LGLSXP:
case INTSXP:
("[%d] '%s' %d\n", i+1, name, INTEGER(el)[0]);
Rprintfbreak;
case CPLXSXP:
{
= COMPLEX(el)[0];
Rcomplex cpl ("[%d] '%s' %f + %fi\n", i+1, name, cpl.r, cpl.i);
Rprintf}
break;
case STRSXP:
("[%d] '%s' %s\n", i+1, name,
Rprintf(STRING_ELT(el, 0)));
Rf_translateCharbreak;
default:
("[%d] '%s' R type\n", i+1, name);
Rprintf}
}
(vmax);
vmaxsetreturn R_NilValue;
}
This can be called by the wrapper function
<- function(...) invisible(.External("showArgs", ...)) showArgs
Note that this style of programming is convenient but not necessary, as an alternative style is
<- function(...) invisible(.Call("showArgs1", list(...))) showArgs1
The (very similar) C code is in the scripts.
Additional functions for accessing pairlist components are CAAR
, CDAR
, CDDR
, and CDDDR
.
These components can be modified with SETCAR
, SETCDR
, SETCADR
, SETCADDR
, SETCADDDR
, and SETCAD4R
.
5.10.3 Missing and special values ¶
One piece of error-checking the .C
call does (unless NAOK
is true) is to check for missing (NA
) and IEEE special values (Inf
, -Inf
and NaN
) and give an error if any are found. With the .Call
interface these will be passed to our code. In this example the special values are no problem, as IEC 60559 arithmetic will handle them correctly. In the current implementation this is also true of NA
as it is a type of NaN
, but it is unwise to rely on such details. Thus we will re-write the code to handle NA
s using macros defined in R_ext/Arith.h included by R.h.
The code changes are the same in any of the versions of convolve2
or convolveE
:
...
for(int i = 0; i < na; i++)
for(int j = 0; j < nb; j++)
if(ISNA(xa[i]) || ISNA(xb[j]) || ISNA(xab[i + j]))
[i + j] = NA_REAL;
xabelse
[i + j] += xa[i] * xb[j];
xab...
Note that the ISNA
macro, and the similar macros ISNAN
(which checks for NaN
or NA
) and R_FINITE
(which is false for NA
and all the special values), only apply to numeric values of type double
. Missingness of integers, logicals and character strings can be tested by equality to the constants NA_INTEGER
, NA_LOGICAL
and NA_STRING
. These and NA_REAL
can be used to set elements of R vectors to NA
.
The constants R_NaN
, R_PosInf
and R_NegInf
can be used to set double
s to the special values.
5.11 Evaluating R expressions from C ¶
The main function we will use is
(SEXP expr, SEXP rho); SEXP Rf_eval
the equivalent of the interpreted R code eval(expr, envir = rho)
(so rho
must be an environment), although we can also make use of Rf_findVar
, Rf_defineVar
and Rf_findFun
(which restricts the search to functions).
To see how this might be applied, here is a simplified internal version of lapply
for expressions, used as
<- list(a = 1:5, b = rnorm(10), test = runif(100))
a .Call("lapply", a, quote(sum(x)), new.env())
with C code
(SEXP list, SEXP expr, SEXP rho)
SEXP lapply{
int n = Rf_length(list);
;
SEXP ans
if(!Rf_isNewList(list)) Rf_error("'list' must be a list");
if(!Rf_isEnvironment(rho)) Rf_error("'rho' should be an environment");
= PROTECT(Rf_allocVector(VECSXP, n));
ans for(int i = 0; i < n; i++) {
(Rf_install("x"), VECTOR_ELT(list, i), rho);
Rf_defineVar(ans, i, Rf_eval(expr, rho));
SET_VECTOR_ELT}
(ans, R_NamesSymbol, Rf_getAttrib(list, R_NamesSymbol));
Rf_setAttrib(1);
UNPROTECTreturn ans;
}
It would be closer to lapply
if we could pass in a function rather than an expression. One way to do this is via interpreted R code as in the next example, but it is possible (if somewhat obscure) to do this in C code. The following is based on the code in src/main/optimize.c.
(SEXP list, SEXP fn, SEXP rho)
SEXP lapply2{
int n = length(list);
, ans;
SEXP R_fcall
if(!Rf_isNewList(list)) Rf_error("'list' must be a list");
if(!Rf_isFunction(fn)) Rf_error("'fn' must be a function");
if(!Rf_isEnvironment(rho)) Rf_error("'rho' should be an environment");
= PROTECT(Rf_lang2(fn, R_NilValue));
R_fcall = PROTECT(Rf_allocVector(VECSXP, n));
ans for(int i = 0; i < n; i++) {
(R_fcall, VECTOR_ELT(list, i));
SETCADR(ans, i, Rf_eval(R_fcall, rho));
SET_VECTOR_ELT}
(ans, R_NamesSymbol, Rf_getAttrib(list, R_NamesSymbol));
Rf_setAttrib(2);
UNPROTECTreturn ans;
}
used by
.Call("lapply2", a, sum, new.env())
Function Rf_lang2
creates an executable pairlist of two elements, but this will only be clear to those with a knowledge of a LISP-like language.
As a more comprehensive example of constructing an R call in C code and evaluating, consider the following fragment. Similar code appears in the definition of do_docall
in src/main/coerce.c.
, t;
SEXP s= s = PROTECT(RF_allocLang(3));
t (t, Rf_install("print")); t = CDR(t);
SETCAR(t, CAR(a)); t = CDR(t);
SETCAR(t, Rf_ScalarInteger(digits));
SETCAR(t, Rf_install("digits"));
SET_TAG(s, env);
Rf_eval(1); UNPROTECT
The function Rf_allocLang
is available as of R 4.4.1; for older versions replace Rf_allocLang(3)
with
LCONS(R_NilValue, Rf_allocList(2))
At this point CAR(a)
is the R object to be printed, the current attribute. There are three steps: the call is constructed as a pairlist of length 3, the list is filled in, and the expression represented by the pairlist is evaluated.
A pairlist is quite distinct from a generic vector list, the only user-visible form of list in R. A pairlist is a linked list (with CDR(t)
computing the next entry), with items (accessed by CAR(t)
) and names or tags (set by SET_TAG
). In this call there are to be three items, a symbol (pointing to the function to be called) and two argument values, the first unnamed and the second named. Setting the type to LANGSXP
makes this a call which can be evaluated.
Customarily, the evaluation environment is passed from the calling R code (see rho
above). In special cases it is possible that the C code may need to obtain the current evaluation environment which can be done via R_GetCurrentEnv()
function.
5.11.1 Zero-finding ¶
In this section we re-work the example of Becker, Chambers & Wilks (1988, pp.~205–10) on finding a zero of a univariate function. The R code and an example are
<- function(f, guesses, tol = 1e-7) {
zero <- function(x) {
f.check <- f(x)
x if(!is.numeric(x)) stop("Need a numeric result")
as.double(x)
}.Call("zero", body(f.check), as.double(guesses), as.double(tol),
new.env())
}
<- function(x) (x^2 + 1) * (x - 1.5)
cube1 zero(cube1, c(0, 5))
where this time we do the coercion and error-checking in the R code. The C code is
(double x)
SEXP mkans{
// no need for PROTECT() here, as REAL(.) does not allocate:
= Rf_allocVector(REALSXP, 1);
SEXP ans (ans)[0] = x;
REALreturn ans;
}
double feval(double x, SEXP f, SEXP rho)
{
// a version with (too) much PROTECT()ion .. "better safe than sorry"
, value;
SEXP symbol(symbol = Rf_install("x"));
PROTECT(value = mkans(x));
PROTECT(symbol, value, rho);
Rf_defineVar(2);
UNPROTECTreturn(REAL(Rf_eval(f, rho))[0]);
}
(SEXP f, SEXP guesses, SEXP stol, SEXP rho)
SEXP zero{
double x0 = REAL(guesses)[0], x1 = REAL(guesses)[1],
= REAL(stol)[0];
tol double f0, f1, fc, xc;
if(tol <= 0.0) Rf_error("non-positive tol value");
= feval(x0, f, rho); f1 = feval(x1, f, rho);
f0 if(f0 == 0.0) return mkans(x0);
if(f1 == 0.0) return mkans(x1);
if(f0*f1 > 0.0) error("x[0] and x[1] have the same sign");
for(;;) {
= 0.5*(x0+x1);
xc if(fabs(x0-x1) < tol) return mkans(xc);
= feval(xc, f, rho);
fc if(fc == 0) return mkans(xc);
if(f0*fc > 0.0) {
= xc; f0 = fc;
x0 } else {
= xc; f1 = fc;
x1 }
}
}
5.11.2 Calculating numerical derivatives ¶
We will use a longer example (by Saikat DebRoy) to illustrate the use of evaluation and .External
. This calculates numerical derivatives, something that could be done as effectively in interpreted R code but may be needed as part of a larger C calculation.
An interpreted R version and an example are
<- function(expr, theta, rho=sys.frame(sys.parent()))
numeric.deriv
{<- sqrt(.Machine$double.eps)
eps <- eval(substitute(expr), rho)
ans <- matrix(, length(ans), length(theta),
grad dimnames=list(NULL, theta))
for (i in seq_along(theta)) {
<- get(theta[i], envir=rho)
old <- eps * max(1, abs(old))
delta assign(theta[i], old+delta, envir=rho)
<- eval(substitute(expr), rho)
ans1 assign(theta[i], old, envir=rho)
<- (ans1 - ans)/delta
grad[, i]
}attr(ans, "gradient") <- grad
ans
}<- 1:5; x <- 1; y <- 2
omega numeric.deriv(sin(omega*x*y), c("x", "y"))
where expr
is an expression, theta
a character vector of variable names and rho
the environment to be used.
For the compiled version the call from R will be
.External("numeric_deriv", expr, theta, rho)
with example usage
.External("numeric_deriv", quote(sin(omega*x*y)),
c("x", "y"), .GlobalEnv)
Note the need to quote the expression to stop it being evaluated in the caller.
Here is the complete C code which we will explain section by section.
#include <R.h>
#include <Rinternals.h>
#include <float.h> /* for DBL_EPSILON */
(SEXP args)
SEXP numeric_deriv{
, expr, rho, ans, ans1, gradient, par, dimnames;
SEXP thetadouble tt, xx, delta, eps = sqrt(DBL_EPSILON), *rgr, *rans;
int i, start;
= CADR(args);
expr if(!Rf_isString(theta = CADDR(args)))
("theta should be of type character");
Rf_errorif(!Rf_isEnvironment(rho = CADDDR(args)))
("rho should be an environment");
Rf_error
= PROTECT(Rf_coerceVector(eval(expr, rho), REALSXP));
ans = PROTECT(Rf_allocMatrix(REALSXP, LENGTH(ans), LENGTH(theta)));
gradient = REAL(gradient); rans = REAL(ans);
rgr
for(i = 0, start = 0; i < LENGTH(theta); i++, start += LENGTH(ans)) {
= PROTECT(Rf_findVar(Rf_installChar(STRING_ELT(theta, i)), rho));
par = REAL(par)[0];
tt = fabs(tt);
xx = (xx < 1) ? eps : xx*eps;
delta (par)[0] += delta;
REAL= PROTECT(Rf_coerceVector(Rf_eval(expr, rho), REALSXP));
ans1 for(int j = 0; j < LENGTH(ans); j++)
[j + start] = (REAL(ans1)[j] - rans[j])/delta;
rgr(par)[0] = tt;
REAL(2); /* par, ans1 */
UNPROTECT}
= PROTECT(Rf_allocVector(VECSXP, 2));
dimnames (dimnames, 1, theta);
SET_VECTOR_ELT(gradient, dimnames);
Rf_dimnamesgets(ans, Rf_install("gradient"), gradient);
Rf_setAttrib(3); /* ans gradient dimnames */
UNPROTECTreturn ans;
}
The code to handle the arguments is
= CADR(args);
expr if(!Rf_isString(theta = CADDR(args)))
Rf_error("theta should be of type character");
if(!Rf_isEnvironment(rho = CADDDR(args)))
Rf_error("rho should be an environment");
Note that we check for correct types of theta
and rho
but do not check the type of expr
. That is because eval
can handle many types of R objects other than EXPRSXP
. There is no useful coercion we can do, so we stop with an error message if the arguments are not of the correct mode.
The first step in the code is to evaluate the expression in the environment rho
, by
= PROTECT(Rf_coerceVector(eval(expr, rho), REALSXP)); ans
We then allocate space for the calculated derivative by
= PROTECT(Rf_allocMatrix(REALSXP, LENGTH(ans), LENGTH(theta))); gradient
The first argument to Rf_allocMatrix
gives the SEXPTYPE
of the matrix: here we want it to be REALSXP
. The other two arguments are the numbers of rows and columns. (Note that LENGTH
is intended to be used for vectors: Rf_length
is more generally applicable.)
for(i = 0, start = 0; i < LENGTH(theta); i++, start += LENGTH(ans)) {
= PROTECT(Rf_findVar(Rf_installChar(STRING_ELT(theta, i)), rho)); par
Here, we are entering a for loop. We loop through each of the variables. In the for
loop, we first create a symbol corresponding to the i
-th element of the STRSXP
theta
. Here, STRING_ELT(theta, i)
accesses the i
-th element of the STRSXP
theta
. installChar()
installs the element as a name and Rf_findVar
finds its value.
= REAL(par)[0];
tt = fabs(tt);
xx = (xx < 1) ? eps : xx*eps;
delta REAL(par)[0] += delta;
= PROTECT(Rf_coerceVector(eval(expr, rho), REALSXP)); ans1
We first extract the real value of the parameter, then calculate delta
, the increment to be used for approximating the numerical derivative. Then we change the value stored in par
(in environment rho
) by delta
and evaluate expr
in environment rho
again. Because we are directly dealing with original R memory locations here, R does the evaluation for the changed parameter value.
for(int j = 0; j < LENGTH(ans); j++)
[j + start] = (REAL(ans1)[j] - rans[j])/delta;
rgr(par)[0] = tt;
REAL(2);
UNPROTECT}
Now, we compute the i
-th column of the gradient matrix. Note how it is accessed: R stores matrices by column (like Fortran).
= PROTECT(Rf_allocVector(VECSXP, 2));
dimnames SET_VECTOR_ELT(dimnames, 1, theta);
Rf_dimnamesgets(gradient, dimnames);
Rf_setAttrib(ans, install("gradient"), gradient);
UNPROTECT(3);
return ans;}
First we add column names to the gradient matrix. This is done by allocating a list (a VECSXP
) whose first element, the row names, is NULL
(the default) and the second element, the column names, is set as theta
. This list is then assigned as the attribute having the symbol R_DimNamesSymbol
. Finally we set the gradient matrix as the gradient attribute of ans
, unprotect the remaining protected locations and return the answer ans
.
5.12 Parsing R code from C ¶
Suppose an R extension wants to accept an R expression from the user and evaluate it. The previous section covered evaluation, but the expression will be entered as text and needs to be parsed first. A small part of R’s parse interface is declared in header file R_ext/Parse.h164.
An example of the usage can be found in the (example) Windows package windlgs included in the R source tree. The essential part is
#include <R.h>
#include <Rinternals.h>
#include <R_ext/Parse.h>
()
SEXP menu_ttest3{
char cmd[256];
, cmdexpr, ans = R_NilValue;
SEXP cmdSexp;
ParseStatus status...
if(done == 1) {
= PROTECT(Rf_allocVector(STRSXP, 1));
cmdSexp (cmdSexp, 0, Rf_mkChar(cmd));
SET_STRING_ELT= PROTECT(R_ParseVector(cmdSexp, -1, &status, R_NilValue));
cmdexpr if (status != PARSE_OK) {
(2);
UNPROTECT("invalid call %s", cmd);
Rf_error}
/* Loop is needed here as EXPSEXP will be of length > 1 */
for(int i = 0; i < Rf_length(cmdexpr); i++)
= Rf_eval(VECTOR_ELT(cmdexpr, i), R_GlobalEnv);
ans (2);
UNPROTECT}
return ans;
}
Note that a single line of text may give rise to more than one R expression.
R_ParseVector
is essentially the code used to implement parse(text=)
at R level. The first argument is a character vector (corresponding to text
) and the second the maximal number of expressions to parse (corresponding to n
). The third argument is a pointer to a variable of an enumeration type, and it is normal (as parse
does) to regard all values other than PARSE_OK
as an error. Other values which might be returned are PARSE_INCOMPLETE
(an incomplete expression was found) and PARSE_ERROR
(a syntax error), in both cases the value returned being R_NilValue
. The fourth argument is a length one character vector to be used as a filename in error messages, a srcfile
object or the R NULL
object (as in the example above). If a srcfile
object was used, a srcref
attribute would be attached to the result, containing a list of srcref
objects of the same length as the expression, to allow it to be echoed with its original formatting.
Two higher-level alternatives are R_ParseString
and R_ParseEvalString
:
Function:SEXP
R_ParseString (const char *
str)
¶
Function:SEXP
R_ParseEvalString (const char *
str, SEXP
env)
¶
: R_ParseString
Parses the code in str and returns the resulting expression. An error is signaled if parsing str produces more than one R expression. R_ParseEvalString
first parses str
, then evaluates the expression in the environment env, and returns the result.
An example from src/main/objects.c:
= R_ParseString("base::nameOfClass(X)"); call
5.12.1 Accessing source references ¶
The source references added by the parser are recorded by R’s evaluator as it evaluates code. Two functions make these available to debuggers running C code:
(int skip); SEXP R_GetCurrentSrcref
This function checks R_Srcref
and the current evaluation stack for entries that contain source reference information. The skip
argument tells how many source references to skip before returning the SEXP
of the srcref
object, counting from the top of the stack. If skip < 0
, abs(skip)
locations are counted up from the bottom of the stack. If too few or no source references are found, NULL
is returned.
(SEXP srcref); SEXP R_GetSrcFilename
This function extracts the filename from the source reference for display, returning a length 1 character vector containing the filename. If no name is found, ""
is returned.
5.13 External pointers and weak references ¶
The SEXPTYPE
s EXTPTRSXP
and WEAKREFSXP
can be encountered at R level, but are created in C code.
External pointer SEXP
s are intended to handle references to C structures such as ‘handles’, and are used for this purpose in package RODBC for example. They are unusual in their copying semantics in that when an R object is copied, the external pointer object is not duplicated. (For this reason external pointers should only be used as part of an object with normal semantics, for example an attribute or an element of a list.)
An external pointer is created by
(void *p, SEXP tag, SEXP prot); SEXP R_MakeExternalPtr
where p
is the pointer (and hence this cannot portably be a function pointer), and tag
and prot
are references to ordinary R objects which will remain in existence (be protected from garbage collection) for the lifetime of the external pointer object. A useful convention is to use the tag
field for some form of type identification and the prot
field for protecting the memory that the external pointer represents, if that memory is allocated from the R heap. Both tag
and prot
can be R_NilValue
, and often are.
An alternative way to create an external pointer from a function pointer is
typedef void * (*R_DL_FUNC)();
(R_DL_FUNC p, SEXP tag, SEXP prot); SEXP R_MakeExternalPtrFn
The elements of an external pointer can be accessed and set via
void *R_ExternalPtrAddr(SEXP s);
(SEXP s);
DL_FUNC R_ExternalPtrAddrFn(SEXP s);
SEXP R_ExternalPtrTag(SEXP s);
SEXP R_ExternalPtrProtectedvoid R_ClearExternalPtr(SEXP s);
void R_SetExternalPtrAddr(SEXP s, void *p);
void R_SetExternalPtrTag(SEXP s, SEXP tag);
void R_SetExternalPtrProtected(SEXP s, SEXP p);
Clearing a pointer sets its value to the C NULL
pointer.
An external pointer object can have a finalizer, a piece of code to be run when the object is garbage collected. This can be R code or C code, and the various interfaces are, respectively.
void R_RegisterFinalizer(SEXP s, SEXP fun);
void R_RegisterFinalizerEx(SEXP s, SEXP fun, Rboolean onexit);
typedef void (*R_CFinalizer_t)(SEXP);
void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun);
void R_RegisterCFinalizerEx(SEXP s, R_CFinalizer_t fun, Rboolean onexit);
The R function indicated by fun
should be a function of a single argument, the object to be finalized. R does not perform a garbage collection when shutting down, and the onexit
argument of the extended forms can be used to ask that the finalizer be run during a normal shutdown of the R session. It is suggested that it is good practice to clear the pointer on finalization.
The only R level function for interacting with external pointers is reg.finalizer
which can be used to set a finalizer.
It is probably not a good idea to allow an external pointer to be save
d and then reloaded, but if this happens the pointer will be set to the C NULL
pointer.
Finalizers can be run at many places in the code base and much of it, including the R interpreter, is not re-entrant. So great care is needed in choosing the code to be run in a finalizer. Finalizers are marked to be run at garbage collection but only run at a somewhat safe point thereafter.
Weak references are used to allow the programmer to maintain information on entities without preventing the garbage collection of the entities once they become unreachable.
A weak reference contains a key and a value. The value is reachable if it is either reachable directly or via weak references with reachable keys. Once a value is determined to be unreachable during garbage collection, the key and value are set to R_NilValue
and the finalizer will be run later in the garbage collection.
Weak reference objects are created by one of
(SEXP key, SEXP val, SEXP fin, Rboolean onexit);
SEXP R_MakeWeakRef(SEXP key, SEXP val, R_CFinalizer_t fin,
SEXP R_MakeWeakRefC); Rboolean onexit
where the R or C finalizer are specified in exactly the same way as for an external pointer object (whose finalization interface is implemented via weak references).
The parts can be accessed via
(SEXP w);
SEXP R_WeakRefKey(SEXP w);
SEXP R_WeakRefValuevoid R_RunWeakRefFinalizer(SEXP w);
A toy example of the use of weak references can be found at https://homepage.stat.uiowa.edu/~luke/R/references/weakfinex.html, but that is used to add finalizers to external pointers which can now be done more directly. At the time of writing no CRAN or Bioconductor package used weak references.
5.13.1 An example ¶
Package RODBC uses external pointers to maintain its channels, connections to databases. There can be several connections open at once, and the status information for each is stored in a C structure (pointed to by thisHandle
in the code extract below) that is returned via an external pointer as part of the RODBC ‘channel’ (as the "handle_ptr"
attribute). The external pointer is created by
, ptr;
SEXP ans= PROTECT(Rf_allocVector(INTSXP, 1));
ans = R_MakeExternalPtr(thisHandle, Rf_install("RODBC_channel"), R_NilValue);
ptr (ptr);
PROTECT(ptr, chanFinalizer, TRUE);
R_RegisterCFinalizerEx...
/* return the channel no */
(ans)[0] = nChannels;
INTEGER/* and the connection string as an attribute */
(ans, Rf_install("connection.string"), constr);
Rf_setAttrib(ans, Rf_install("handle_ptr"), ptr);
Rf_setAttrib(3);
UNPROTECTreturn ans;
Note the symbol given to identify the usage of the external pointer, and the use of the finalizer. Since the final argument when registering the finalizer is TRUE
, the finalizer will be run at the end of the R session (unless it crashes). This is used to close and clean up the connection to the database. The finalizer code is simply
static void chanFinalizer(SEXP ptr)
{
if(!R_ExternalPtrAddr(ptr)) return;
(R_ExternalPtrAddr(ptr));
inRODBCClose(ptr); /* not really needed */
R_ClearExternalPtr}
Clearing the pointer and checking for a NULL
pointer avoids any possibility of attempting to close an already-closed channel.
R’s connections provide another example of using external pointers, in that case purely to be able to use a finalizer to close and destroy the connection if it is no longer is use.
5.14 Vector accessor functions ¶
The vector accessors like REAL
, INTEGER
, LOGICAL
, RAW
, COMPLEX
, and VECTOR_ELT
are functions when used in R extensions. (For efficiency they may be macros or inline functions when used in the R source code, apart from SET_STRING_ELT
and SET_VECTOR_ELT
which are always functions. When used outside the R source code all vector accessors are functions.) There are also read-only versions that return a const
data pointer. For example, the return type of REAL_RO
is const double *
.
These accessor functions check that they are being used on an appropriate type of SEXP
. For VECSXP
and STRSXP
objects only read-only pointers are available as modifying their data directly would violate assumptions the memory manager depends on. DATAPTR_RO
returns a generic read-only data pointer for any vector object.
Formerly it was possible for packages to obtain internal versions of some accessors by defining USE_RINTERNALS’ before including Rinternals.h. This is no longer the case. Defining USE_RINTERNALS’ now has no effect.
Atomic vector elements can also be accessed and set using element-wise operations like INTEGER_ELT
and SET_INTEGER_ELT
. For objects with a compact representation using these may avoid fully materializing the object. In contrast, obtaining a data pointer will have to fully materialize the object.
5.15 Character encoding issues ¶
CHARSXP
s can be marked as coming from a known encoding (Latin-1 or UTF-8). This is mainly intended for human-readable output, and most packages can just treat such CHARSXP
s as a whole. However, if they need to be interpreted as characters or output at C level then it would normally be correct to ensure that they are converted to the encoding of the current locale: this can be done by accessing the data in the CHARSXP
by Rf_translateChar
rather than by CHAR
. If re-encoding is needed this allocates memory with R_alloc
which thus persists to the end of the .Call
/.External
call unless vmaxset
is used (see Transient storage allocation).
There is a similar function Rf_translateCharUTF8
which converts to UTF-8: this has the advantage that a faithful translation is almost always possible (whereas only a few languages can be represented in the encoding of the current locale unless that is UTF-8).
Both Rf_translateChar
and Rf_translateCharUTF8
will translate any input, using escapes such as <A9>’ and <U+0093>’ to represent untranslatable parts of the input.
There is a public interface to the encoding marked on CHARSXPs
via
typedef enum {CE_NATIVE, CE_UTF8, CE_LATIN1, CE_BYTES, CE_SYMBOL, CE_ANY} cetype_t;
(SEXP);
cetype_t Rf_getCharCE(const char *, cetype_t); SEXP Rf_mkCharCE
Only CE_UTF8
and CE_LATIN1
are marked on CHARSXPs
(and so Rf_getCharCE
will only return one of the first three), and these should only be used on non-ASCII strings. Value CE_BYTES
is used to make CHARSXP
s which should be regarded as a set of bytes and not translated. Value CE_SYMBOL
is used internally to indicate Adobe Symbol encoding. Value CE_ANY
is used to indicate a character string that will not need re-encoding – this is used for character strings known to be in ASCII, and can also be used as an input parameter where the intention is that the string is treated as a series of bytes. (See the comments under Rf_mkChar
about the length of input allowed.)
Function
(SEXP); Rboolean Rf_charIsASCII
can be used to detect whether a given CHARSXP
represents an ASCII string. The implementation is equivalent to checking individual characters, but may be faster.
Function
(SEXP); Rboolean Rf_charIsUTF8
can be used to detect whether the internal representation of a given CHARSXP
accessed via CHAR
is UTF-8 (including ASCII). This function is rarely needed and specifically is not needed with Rf_translateCharUTF8
, because such check is already included. However, when needed, it is better to use it in preference of Rf_getCharCE
, as it is safer against future changes in the semantics of encoding marks and covers strings internally represented in the native encoding. Note that charIsUTF8()
is not equivalent to getCharCE() == CE_UTF8
.
Similarly, function
(SEXP); Rboolean Rf_charIsLatin1
can be used to detect whether the internal representation of a given CHARSXP
accessed via CHAR
is latin1 (including ASCII). It is not equivalent to Rf_getCharCE() == CE_LATIN1
.
Function
const char *Rf_reEnc(const char *x, cetype_t ce_in, cetype_t ce_out,
int subst);
can be used to re-encode character strings: like Rf_translateChar
it returns a string allocated by R_alloc
. This can translate from CE_SYMBOL
to CE_UTF8
, but not conversely. Argument subst
controls what to do with untranslatable characters or invalid input: this is done byte-by-byte with 1
indicates to output hex of the form <a0>
, and 2
to replace by .
, with any other value causing the byte to produce no output.
There is also
(const char *, int, cetype_t); SEXP Rf_mkCharLenCE
to create marked character strings of a given length.
5.16 Writing compact-representation-friendly code ¶
A simple way to iterate in C over the elements of an atomic vector is to obtain a data pointer and index into that pointer with standard C indexing. However, if the object has a compact representation, then obtaining the data pointer will force the object to be fully materialized. An alternative is to use one of the following functions to query whether a data pointer is available.
Function:const int *
LOGICAL_OR_NULL (SEXP
x)
¶
Function:const int *
INTEGER_OR_NULL (SEXP
x)
¶
Function:const double *
REAL_OR_NULL (SEXP
x)
¶
Function:const Rcomplex *
COMPLEX_OR_NULL (SEXP
x)
¶
Function:const Rbyte *
RAW_OR_NULL (SEXP
x)
¶
Function:const void *
DATAPTR_OR_NULL (SEXP
x)
¶
: These functions will return a data pointer if one is available. For vectors with a compact representation these functions will return NULL
.
If a data pointer is not available, then code can access elements one at a time with functions like REAL_ELT
. This is often sufficient, but in some cases can be inefficient. An alternative is to request data for contiguous blocks of elements. For a good choice of block size this can be nearly as efficient as direct pointer access.
Function:R_xlen_t
INTEGER_GET_REGION (SEXP
sx, R_xlen_t
i, R_xlen_t
n, int *
buf)
¶
Function:R_xlen_t
LOGICAL_GET_REGION (SEXP
sx, R_xlen_t
i, R_xlen_t
n, int *
buf)
¶
Function:R_xlen_t
REAL_GET_REGION (SEXP
sx, R_xlen_t
i, R_xlen_t
n, double *
buf)
¶
Function:R_xlen_t
COMPLEX_GET_REGION (SEXP
sx, R_xlen_t
i, R_xlen_t
n, Rcomplex *
buf)
¶
Function:R_xlen_t
RAW_GET_REGION (SEXP
sx, R_xlen_t
i, R_xlen_t
n, Rbyte *
buf)
¶
: These functions copy a contiguous set of up to n
elements starting with element i
into a buffer buf
. The return value is the actual number of elements copied, which may be less than n
.
Macros in R_ext/Itermacros.h may help in implementing an iteration strategy.
Some functions useful in implementing new alternate representation classes, beyond those defined in R_ext/Altrep.h, include ALTREP
, ALTREP_CLASS
, R_altrep_data1
, R_set_altrep_data1
, R_altrep_data2
, and R_set_altrep_data2
.
For some objects it may be possible to very efficiently determine whether the object is sorted or contains no NA
values. These functions can be used to query this information:
Function:int
LOGICAL_NO_NA (SEXP
x)
¶
Function:int
INTEGER_NO_NA (SEXP
x)
¶
Function:int
REAL_NO_NA (SEXP
x)
¶
Function:int
STRING_NO_NA (SEXP
x)
¶
: A TRUE
result means it is known that there are no NA
values. A FALSE
result means it is not known whether there are any NA
values.
Function:int
INTEGER_IS_SORTED (SEXP
x)
¶
Function:int
REAL_IS_SORTED (SEXP
x)
¶
Function:int
STRING_IS_SORTED (SEXP
x)
¶
: These functions return one of SORTED_DECR
, SORTED_INCR
, or UNKNOWN_SORTEDNESS
.