6 The R API: entry points for C code
There are a large number of entry points in the R executable/DLL that can be called from C code (and some that can be called from Fortran code). Only those documented here are stable enough that they will only be changed with considerable notice.
The recommended procedure to use these is to include the header file R.h
in your C code by
#include <R.h>
This will include several other header files from the directory R_INCLUDE_DIR/R_ext
, and there are other header files there that can be included too, but many of the features they contain should be regarded as undocumented and unstable.
Most of these header files, including all those included by R.h
, can be used from C++ code. (However, they cannot safely be included in a extern "C" { }
block as they may include C++ headers when included from C++ code—and whether this succeeds is system-specific).
Note: Because R re-maps many of its external names to avoid clashes with system or user code, it is essential to include the appropriate header files when using these entry points.
This remapping can cause problems1, and can be eliminated by defining R_NO_REMAP
(before including any R headers) and prepending Rf_
to all the function names used from Rinternals.h
and R_ext/Error.h
. These problems can usually be avoided by including other headers (such as system headers and those for external software used by the package) before any R headers. (Headers from other packages may include R headers directly or via inclusion from further packages, and may define R_NO_REMAP
with or without including Rinternals.h
.)
1 Known problems have been defining LENGTH
, error
, length
, match
, vector
and warning
: whether these matter depends on the OS and toolchain, with many problem reports involving clang++
.
As from R 4.5.0, R_NO_REMAP
is always defined when the R headers are included from C++ code.
Some of these entry points are declared in header Rmath.h
, most of which are remapped there. That remapping can be eliminated by defining R_NO_REMAP_RMATH
(before including any R headers) and prepending Rf_
to the function names used from that header except
exp_rand norm_rand unif_rand signrank_free wilcox_free
We can classify the entry points as
- API
-
Entry points which are documented in this manual and declared in an installed header file. These can be used in distributed packages and ideally will only be changed after deprecation. See API index.
- public
-
Entry points declared in an installed header file that are exported on all R platforms but are not documented and subject to change without notice. Do not use these in distributed code. Their declarations will eventually be moved out of installed header files.
- private
-
Entry points that are used when building R and exported on all R platforms but are not declared in the installed header files. Do not use these in distributed code.
- hidden
-
Entry points that are where possible (Windows and some modern Unix-alike compilers/loaders when using R as a shared library) not exported.
- experimental
-
Entry points declared in an installed header file that are part of an experimental API, such as
R_ext/Altrep.h
. These are subject to change, so package authors wishing to use these should be prepared to adapt. See Experimental API index. - embedding
-
Entry points intended primarily for embedding and creating new front-ends. It is not clear that this needs to be a separate category but it may be useful to keep it separate for now. See Embedding API index.
If you would like to use an entry point or variable that is not identified as part of the API in this document, or is currently hidden, you can make a request for it to be made available. Entry points or variables not identified as in the API may be changed or removed with no notice as part of efforts to improve aspects of R.
Work in progress: Currently Entry points in the API are identified in the source for this document with @apifun
, @eapifun
, and @embfun
entries. Similarly, @apivar
, @eapivar
, and @embvar
identify variables, and @apihdr
, @eapihdr
, and @embhdr
identify headers in the API. This could be used for programmatic extraction, but the specific format is work in progress and even the way this document is produced is subject to change.
6.1 Memory allocation
There are two types of memory allocation available to the C programmer, one in which R manages the clean-up and the other in which users have full control (and responsibility).
These functions are declared in header R_ext/RS.h
which is included by R.h
.
6.1.1 Transient storage allocation
Here R will reclaim the memory at the end of the call to .C
, .Call
or .External
. Use
char *R_alloc(size_t n, int size)
which allocates n
units of size
bytes each. A typical usage (from package stats) is
= (int *) R_alloc(nrows(merge)+2, sizeof(int)); x
(size_t
is defined in stddef.h
which the header defining R_alloc
includes.)
There is a similar call, S_alloc
(named for compatibility with older versions of S) which zeroes the memory allocated,
char *S_alloc(long n, int size)
and
char *S_realloc(char *p, long new, long old, int size)
which (for new > old
) changes the allocation size from old
to new
units, and zeroes the additional units. NB: these calls are best avoided as long
is insufficient for large memory allocations on 64-bit Windows (where it is limited to 2^31-1 bytes).
This memory is taken from the heap, and released at the end of the .C
, .Call
or .External
call. Users can also manage it, by noting the current position with a call to vmaxget
and subsequently clearing memory allocated by a call to vmaxset
. An example might be
void *vmax = vmaxget()
// a loop involving the use of R_alloc at each iteration
(vmax) vmaxset
This is only recommended for experts.
Note that this memory will be freed on error or user interrupt (if allowed: see Allowing interrupts).
The memory returned is only guaranteed to be aligned as required for double
pointers: take precautions if casting to a pointer which needs more. There is also
long double *R_allocLD(size_t n)
which is guaranteed to have the 16-byte alignment needed for long double
pointers on some platforms.
These functions should only be used in code called by .C
etc, never from front-ends. They are not thread-safe.
6.1.2 User-controlled memory
The other form of memory allocation is an interface to malloc
, the interface providing R error signaling. This memory lasts until freed by the user and is additional to the memory allocated for the R workspace.
The interface macros are
* R_Calloc(size_t n, type)
type* R_Realloc(any *p, size_t n, type)
typevoid R_Free(any *p)
providing analogues of calloc
, realloc
and free
. If there is an error during allocation it is handled by R, so if these return the memory has been successfully allocated or freed. R_Free
will set the pointer p
to NULL
. (Some but not all versions of S did so.)
Users should arrange to R_Free
this memory when no longer needed, including on error or user interrupt. This can often be done most conveniently from an on.exit
action in the calling R function – see pwilcox
for an example.
Do not assume that memory allocated by R_Calloc
/R_Realloc
comes from the same pool as used by malloc
:2 in particular do not use free
or strdup
with it.
2 That was not the case on Windows prior to R 4.2.0.
Memory obtained by these macros should be aligned in the same way as malloc
, that is ‘suitably aligned for any kind of variable’.
Historically the macros Calloc
, Free
and Realloc
were used but have been removed in \R 4.5.0.
R_Calloc
, R_Realloc
, and R_Free
are currently implemented as macros expanding to calls to R_chk_calloc
, R_chk_realloc
, and R_chk_free
, respectively. These should not be called directly as they may be removed in the future.
char * CallocCharBuf(size_t n)
void * Memcpy(q, p, n)
void * Memzero(p, n)
CallocCharBuf(n)
is shorthand for R_Calloc(n+1, char)
to allow for the nul
terminator. Memcpy
and Memzero
take n
items from array p
and copy them to array q
or zero them respectively.
6.2 Error signaling
The basic error signaling routines are the equivalents of stop
and warning
in R code, and use the same interface.
void error(const char * format, ...);
void warning(const char * format, ...);
void errorcall(SEXP call, const char * format, ...);
void warningcall(SEXP call, const char * format, ...);
void warningcall_immediate(SEXP call, const char * format, ...);
These have the same call sequences as calls to printf
, but in the simplest case can be called with a single character string argument giving the error message. (Don’t do this if the string contains %
or might otherwise be interpreted as a format.)
These are defined in header R_ext/Error.h
included by R.h
. NB: when R_NO_REMAP
is defined (as is recommended for C++ code), Rf_error
etc must be used.
6.2.1 Error signaling from Fortran
There are two interface function provided to call Rf_error
and Rf_warning
from Fortran code, in each case with a simple character string argument. They are defined as
rexit(message)
subroutine rwarn(message) subroutine
Messages of more than 255 characters are truncated, with a warning.
6.3 Random number generation
The interface to R’s internal random number generation routines is
double unif_rand();
double norm_rand();
double exp_rand();
double R_unif_index(double);
giving one uniform, normal or exponential pseudo-random variate. However, before these are used, the user must call
(); GetRNGstate
and after all the required variates have been generated, call
(); PutRNGstate
These essentially read in (or create) .Random.seed
and write it out after use.
These are defined in header R_ext/Random.h
.
The random number generator is private to R; there is no way to select the kind of RNG nor set the seed except by evaluating calls to the R functions.
The C code behind R’s rxxx
functions can be accessed by including the header file Rmath.h
; See Distribution functions. Those calls should also be preceded and followed by calls to GetRNGstate
and PutRNGstate
.
6.3.1 Random-number generation from Fortran
It was explained earlier that Fortran random-number generators should not be used in R packages, not least as packages cannot safely initialize them. Rather a package should call R’s built in generators: one way to do so is to use C wrappers like
#include <R_ext/RS.h>
#include <R_ext/Random.h>
void F77_SUB(getRNGseed)(void) {
();
GetRNGstate}
void F77_SUB(putRNGseed)(void) {
();
PutRNGstate}
double F77_SUB(unifRand)(void) {
return(unif_rand());
}
called from Fortran code like
...
double precision X
()
call getRNGseed= unifRand()
X ...
() call putRNGseed
Alternatively one could use Fortran 2003’s iso_c_binding
module by something like (fixed-form Fortran 90 code):
module rngfuncs
use iso_c_binding
interfacedouble precision
* function unifRand() bind(C, name = "unif_rand")
end function unifRand
() bind(C, name = "GetRNGstate")
subroutine getRNGseed
end subroutine getRNGseed
() bind(C, name = "PutRNGstate")
subroutine putRNGseed
end subroutine putRNGseed
end interface
end module rngfuncs
subroutine testit
use rngfuncsdouble precision X
()
call getRNGseed= unifRand()
X *, X
print ()
call putRNGSeed end subroutine testit
6.4 Missing and IEEE special values
A set of functions is provided to test for NA
, Inf
, -Inf
and NaN
. These functions are accessed via macros:
(x) True for R’s NA only
ISNA(x) True for R’s NA and IEEE NaN
ISNAN(x) False for Inf, -Inf, NA, NaN R_FINITE
and via function R_IsNaN
which is true for NaN
but not NA
.
Do use R_FINITE
rather than isfinite
or finite
; the latter is often mendacious and isfinite
is only available on a some platforms, on which R_FINITE
is a macro expanding to isfinite
.
Currently in C code ISNAN
is a macro calling isnan
. (Since this gives problems on some C++ systems, if the R headers is called from C++ code a function call is used.)
You can check for Inf
or -Inf
by testing equality to R_PosInf
or R_NegInf
, and set (but not test) an NA
as NA_REAL
.
All of the above apply to double variables only. For integer variables there is a variable accessed by the macro NA_INTEGER
which can used to set or test for missingness.
These are defined in header R_ext/Arith.h
included by R.h
.
6.5 Printing
The most useful function for printing from a C routine compiled into R is Rprintf
. This is used in exactly the same way as printf
, but is guaranteed to write to R’s output (which might be a GUI console rather than a file, and can be re-directed by sink
). It is wise to write complete lines (including the "\n"
) before returning to R. It is defined in R_ext/Print.h
.
The function REprintf
is similar but writes on the error stream (stderr
) which may or may not be different from the standard output stream.
Functions Rvprintf
and REvprintf
are analogues using the vprintf
interface. Because that is a C993 interface, they are only defined by R_ext/Print.h
in C++ code if the macro R_USE_C99_IN_CXX
is defined before it is included or (as from R 4.0.0) a C++11 compiler is used.
3 also part of C++11.
Another circumstance when it may be important to use these functions is when using parallel computation on a cluster of computational nodes, as their output will be re-directed/logged appropriately.
6.5.1 Printing from Fortran
On many systems Fortran write
and print
statements can be used, but the output may not interleave well with that of C, and may be invisible on GUI interfaces. They are not portable and best avoided.
Some subroutines are provided to ease the output of information from Fortran code.
(label, nchar, data, ndata)
subroutine dblepr(label, nchar, data, ndata)
subroutine realpr(label, nchar, data, ndata) subroutine intpr
and from R 4.0.0,
(label, nchar)
subroutine labelpr(label, nchar, var)
subroutine dblepr1(label, nchar, var)
subroutine realpr1(label, nchar, var) subroutine intpr1
Here label
is a character label of up to 255 characters, nchar
is its length (which can be -1
if the whole label is to be used), data
is an array of length at least ndata
of the appropriate type (double precision
, real
and integer
respectively) and var
is a (scalar) variable. These routines print the label on one line and then print data
or var
as if it were an R vector on subsequent line(s). Note that some compilers will give an error or warning unless data
is an array: others will accept a scalar when ndata
has value one or zero. NB: There is no check on the type of data
or var
, so using real
(including a real constant) instead of double precision
will give incorrect answers.
intpr
works with zero ndata
so can be used to print a label in earlier versions of R.
6.6 Calling C from Fortran and vice versa
Naming conventions for symbols generated by Fortran differ by platform: it is not safe to assume that Fortran names appear to C with a trailing underscore. To help cover up the platform-specific differences there is a set of macros4 that should be used.
4 The F77_
in the names is historical and dates back to usage in S.
F77_SUB(name)
-
to define a function in C to be called from Fortran
F77_NAME(name)
-
to declare a Fortran routine in C before use
F77_CALL(name)
-
to call a Fortran routine from C
On current platforms these are the same, but it is unwise to rely on this. Note that names containing underscores were not legal in Fortran 77, and are not portably handled by the above macros. (Also, all Fortran names for use by R are lower case, but this is not enforced by the macros.)
For example, suppose we want to call R’s normal random numbers from Fortran. We need a C wrapper along the lines of
#include <R.h>
void F77_SUB(rndstart)(void) { GetRNGstate(); }
void F77_SUB(rndend)(void) { PutRNGstate(); }
double F77_SUB(normrnd)(void) { return norm_rand(); }
to be called from Fortran as in
()
subroutine testitdouble precision normrnd, x
()
call rndstart= normrnd()
x ("X was", 5, x, 1)
call dblepr()
call rndend end
Note that this is not guaranteed to be portable, for the return conventions might not be compatible between the C and Fortran compilers used. (Passing values via arguments is safer.)
The standard packages, for example stats, are a rich source of further examples.
Where supported, link time optimization provides a reliable way to check the consistency of calls to C from Fortran or vice versa. See Using Link-time Optimization. One place where this occurs is the registration of .Fortran
calls in C code (see Registering native routines). For example
.c:10:13: warning: type of 'vsom_' does not match original
init[-Wlto-type-mismatch]
declaration extern void F77_NAME(vsom)(void *, void *, void *, void *,
void *, void *, void *, void *, void *);
.f90:20:33: note: type mismatch in parameter 9
vsom(neurons,dt,dtrows,dtcols,xdim,ydim,alpha,train)
subroutine vsom.f90:20:33: note: 'vsom' was previously declared here vsom
shows that a subroutine has been registered with 9 arguments (as that is what the .Fortran
call used) but only has 8.
6.6.1 Fortran character strings
Passing character strings from C to Fortran or vice versa is not portable, but can be done with care. The internal representations are different: a character array in C (or C++) is NUL-terminated so its length can be computed by strlen
. Fortran character arrays are typically stored as an array of bytes and a length. This matters when passing strings from C to Fortran or vice versa: in many cases one has been able to get away with passing the string but not the length. However, in 2019 this changed for gfortran
, starting with version 9 but backported to versions 7 and 8. Several months later, gfortran
9.2 introduced an option
-ftail-call-workaround
and made it the current default but said it might be withdrawn in future.
Suppose we want a function to report a message from Fortran to R’s console (one could use labelpr
, or intpr
with dummy data, but this might be the basis of a custom reporting function). Suppose the equivalent in Fortran would be
(msg)
subroutine rmsg*(*) msg
character*.msg
print end
in file rmsg.f
. Using gfortran
9.2 and later we can extract the C view by
-c -fc-prototypes-external rmsg.f gfortran
which gives
void rmsg_ (char *msg, size_t msg_len);
(where size_t
applies to version 8 and later). We could re-write that portably in C as
#ifndef USE_FC_LEN_T
# define USE_FC_LEN_T
#endif
#include <Rconfig.h> // included by R.h, so define USE_FC_LEN_T early
void F77_NAME(rmsg)(char *msg, FC_LEN_T msg_len)
{
char cmsg[msg_len+1];
(cmsg, msg, msg_len);
strncpy[msg_len] = '\0'; // nul-terminate the string, to be sure
cmsg// do something with 'cmsg'
}
in code depending on R(>= 3.6.2)
. For earlier versions of R we could just assume that msg
is NUL-terminated (not guaranteed, but people have been getting away with it for many years), so the complete C side might be
#ifndef USE_FC_LEN_T
# define USE_FC_LEN_T
#endif
#include <Rconfig.h>
#ifdef FC_LEN_T
void F77_NAME(rmsg)(char *msg, FC_LEN_T msg_len)
{
char cmsg[msg_len+1];
(cmsg, msg, msg_len);
strncpy[msg_len] = '\0';
cmsg// do something with 'cmsg'
}
#else
void F77_NAME(rmsg)(char *msg)
{
// do something with 'msg'
}
#endif
(USE_FC_LEN_T
is the default as from R 4.3.0.)
An alternative is to use Fortran 2003 features to set up the Fortran routine to pass a C-compatible character string. We could use something like
module cfuncs, only: c_char, c_null_char
use iso_c_binding
interface(msg) bind(C, name = 'cmsg')
subroutine cmsg, only: c_char
use iso_c_binding(kind = c_char):: msg(*)
character
end subroutine cmsg
end interface
end module
(msg)
subroutine rmsg
use cfuncs(*) msg
character(msg//c_null_char) ! need to concatenate a nul terminator
call cmsg end subroutine rmsg
where the C side is simply
void cmsg(const char *msg)
{
// do something with nul-terminated string 'msg'
}
If you use bind
to a C function as here, the only way to check that the bound definition is correct is to compile the package with LTO (which requires compatible C and Fortran compilers, usually gcc
and gfortran
).
Passing a variable-length string from C to Fortran is trickier, but https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-0/bind-c.html provides a recipe. However, all the uses in BLAS and LAPACK are of a single character, and for these we can write a wrapper in Fortran along the lines of
(transa, transb, m, n, k, alpha,
subroutine c_dgemm+ a, lda, b, ldb, beta, c, ldc)
+ bind(C, name = 'Cdgemm')
, only : c_char, c_int, c_double
use iso_c_binding(c_char), intent(in) :: transa, transb
character(c_int), intent(in) :: m, n, k, lda, ldb, ldc
integer(c_double), intent(in) :: alpha, beta, a(lda, *), b(ldb, *)
real(c_double), intent(out) :: c(ldc, *)
real(transa, transb, m, n, k, alpha,
call dgemm+ a, lda, b, ldb, beta, c, ldc)
end subroutine c_dgemm
which is then called from C with declaration
void
(const char *transa, const char *transb, const int *m,
Cdgemmconst int *n, const int *k, const double *alpha,
const double *a, const int *lda, const double *b, const int *ldb,
const double *beta, double *c, const int *ldc);
Alternatively, do as R does as from version 3.6.2 and pass the character length(s) from C to Fortran. A portable way to do this is
// before any R headers, or define in PKG_CPPFLAGS
#ifndef USE_FC_LEN_T
# define USE_FC_LEN_T
#endif
#include <Rconfig.h>
#include <R_ext/BLAS.h>
#ifndef FCONE
# define FCONE
#endif
...
(dgemm)("N", "T", &nrx, &ncy, &ncx, &one, x,
F77_CALL&nrx, y, &nry, &zero, z, &nrx FCONE FCONE);
(Note there is no comma before or between the FCONE
invocations.) It is strongly recommended that packages which call from C/C++ BLAS/LAPACK routines with character arguments adopt this approach: packages not using will fail to install as from R 4.3.0.
6.6.2 Fortran LOGICAL
Passing Fortran LOGICAL variables to/from C/C++ is potentially compiler-dependent. Fortran compilers have long used a 32-bit integer type so it is pretty portable to use int *
on the C/C++ side. However, recent versions of gfortran
via the option -fc-prototypes-external
say the C equivalent is int_least32_t *
: ‘Link-Time Optimization’ will report int *
as a mismatch. It is possible to use iso_c_binding
in Fortran 2003 to map LOGICAL variables to the C99 type _Bool
, but it is usually simpler to pass integers.
6.6.3 Passing functions
A number of packages call C functions passed as arguments to Fortran code along the lines of
(m,n,x,fvec,iflag)
c subroutine fcn,n,iflag
c integer mdouble precision x(n),fvec(m)
c ...
(fcn, ... subroutine lmdif
where the C declaration and call are
void fcn_lmdif(int *m, int *n, double *par, double *fvec, int *iflag);
void F77_NAME(lmdif)(void (*fcn_lmdif)(int *m, int *n, double *par,
double *fvec, int *iflag), ...
(lmdif)(&fcn_lmdif, ... F77_CALL
This works on most platforms but depends on the C and Fortran compilers agreeing on calling conventions: this have been seen to fail. The most portable solution seems to be to convert the Fortran code to C, perhaps using f2c
.
6.7 Numerical analysis subroutines
R contains a large number of mathematical functions for its own use, for example numerical linear algebra computations and special functions.
The header files R_ext/BLAS.h
, R_ext/Lapack.h
and R_ext/Linpack.h
contain declarations of the BLAS, LAPACK and LINPACK linear algebra functions included in R. These are expressed as calls to Fortran subroutines, and they will also be usable from users’ Fortran code. Although not part of the official API, this set of subroutines is unlikely to change (but might be supplemented).
The header file Rmath.h
lists many other functions that are available and documented in the following subsections. Many of these are C interfaces to the code behind R functions, so the R function documentation may give further details.
6.7.1 Distribution functions
The routines used to calculate densities, cumulative distribution functions and quantile functions for the standard statistical distributions are available as entry points.
The arguments for the entry points follow the pattern of those for the normal distribution:
double dnorm(double x, double mu, double sigma, int give_log);
double pnorm(double x, double mu, double sigma, int lower_tail,
int give_log);
double qnorm(double p, double mu, double sigma, int lower_tail,
int log_p);
double rnorm(double mu, double sigma);
That is, the first argument gives the position for the density and CDF and probability for the quantile function, followed by the distribution’s parameters. Argument lower_tail
should be TRUE
(or 1
) for normal use, but can be FALSE
(or 0
) if the probability of the upper tail is desired or specified.
Finally, give_log
should be non-zero if the result is required on log scale, and log_p
should be non-zero if p
has been specified on log scale.
Note that you directly get the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by using
- pdist(t, ..., /*lower_tail = */ FALSE, /* give_log = */ TRUE)
or shorter (and more cryptic) - pdist(t, ..., 0, 1)
.
The random-variate generation routine rnorm
returns one normal variate. See Random number generation, for the protocol in using the random-variate routines.
Note that these argument sequences are (apart from the names and that rnorm
has no n
) mainly the same as the corresponding R functions of the same name, so the documentation of the R functions can be used. Note that the exponential and gamma distributions are parametrized by scale
rather than rate
.
For reference, the following table gives the basic name (to be prefixed by d
, p
, q
or r
apart from the exceptions noted) and distribution-specific arguments for the complete set of distributions.
beta beta
a
,b
non-central beta nbeta
a
,b
,ncp
binomial binom
n
,p
Cauchy cauchy
location
,scale
chi-squared chisq
df
non-central chi-squared nchisq
df
,ncp
exponential exp
scale
(and notrate
)F f
n1
,n2
non-central F nf
n1
,n2
,ncp
gamma gamma
shape
,scale
geometric geom
p
hypergeometric hyper
NR
,NB
,n
logistic logis
location
,scale
lognormal lnorm
logmean
,logsd
negative binomial nbinom
size
,prob
normal norm
mu
,sigma
Poisson pois
lambda
Student’s t t
n
non-central t nt
df
,delta
Studentized range tukey
(*)rr
,cc
,df
uniform unif
a
,b
Weibull weibull
shape
,scale
Wilcoxon rank sum wilcox
m
,n
Wilcoxon signed rank signrank
n
Entries marked with an asterisk only have p
and q
functions available, and none of the non-central distributions have r
functions.
(If remapping is suppressed, the Normal distribution names are Rf_dnorm4
, Rf_pnorm5
and Rf_qnorm5
.)
Additionally, a multivariate RNG for the multinomial distribution is
void rmultinom(int n, double* prob, int K, int* rN)
where K = length(prob)
, sum(prob[.]) == 1 and rN
must point to a length-K
integer vector n1 n2 .. nK where each entry nj=rN[j]
is “filled” by a random binomial from Bin(n; prob[j]), constrained to sum(rN[.]) == n.
After calls to dwilcox
, pwilcox
or qwilcox
the function wilcox_free()
should be called, and similarly signrank_free()
for the signed rank functions. Since wilcox_free()
and signrank_free()
were only added to Rmath.h
in R 4.2.0, their use requires something like
#include "Rmath.h"
#include "Rversion.h"
#if R_VERSION < R_Version(4, 2, 0)
extern void wilcox_free(void);
extern void signrank_free(void);
#endif
For the negative binomial distribution (nbinom
), in addition to the (size, prob)
parametrization, the alternative (size, mu)
parametrization is provided as well by functions [dpqr]nbinom_mu()
, see ?NegBinomial in R.
Functions dpois_raw(x, *)
and dbinom_raw(x, *)
are versions of the Poisson and binomial probability mass functions which work continuously in x
, whereas dbinom(x,*)
and dpois(x,*)
only return non zero values for integer x
.
double dbinom_raw(double x, double n, double p, double q, int give_log)
double dpois_raw (double x, double lambda, int give_log)
Note that dbinom_raw()
returns both p and q = 1-p which may be advantageous when one of them is close to 1.
6.7.2 Mathematical functions
Function:double gammafn (double x
) ¶
Function:double lgammafn (double x
) ¶
Function:double digamma (double x
) ¶
Function:double trigamma (double x
) ¶
Function:double tetragamma (double x
) ¶
Function:double pentagamma (double x
) ¶
Function:double psigamma (double x
, double deriv
) ¶
Function:void dpsifn (double x
, int n
, int kode
, int m
, double* ans
, int* nz
, int* ierr
) ¶
: The Gamma function, the natural logarithm of its absolute value and first four derivatives and the n-th derivative of Psi, the digamma function, which is the derivative of lgammafn
. In other words, digamma(x)
is the same as psigamma(x,0)
, trigamma(x) == psigamma(x,1)
, etc. The underlying workhorse, dpsifn()
, is useful, e.g., when several derivatives of log Gamma=lgammafn
are desired. It computes and returns in ans[]
the length-m
sequence (-1)^(k+1) / gamma(k+1) * psi(k;x) for k = n ... n+m-1, where psi(k;x) is the k-th derivative of Psi(x), i.e., psigamma(x,k)
. For more details, see the comments in src/nmath/polygamma.c
.
Function:double beta (double a
, double b
) ¶
Function:double lbeta (double a
, double b
) ¶
: The (complete) Beta function and its natural logarithm.
Function:double choose (double n
, double k
) ¶
Function:double lchoose (double n
, double k
) ¶
: The number of combinations of k
items chosen from n
and the natural logarithm of its absolute value, generalized to arbitrary real n
. k
is rounded to the nearest integer (with a warning if needed).
Function:double bessel_i (double x
, double nu
, double expo
) ¶
Function:double bessel_j (double x
, double nu
) ¶
Function:double bessel_k (double x
, double nu
, double expo
) ¶
Function:double bessel_y (double x
, double nu
) ¶
: Bessel functions of types I, J, K and Y with index nu
. For bessel_i
and bessel_k
there is the option to return exp(-x
) I(x
; nu
) or exp(x
) K(x
; nu
) if expo
is 2. (Use expo == 1
for unscaled values.)
6.7.3 Numerical Utilities
There are a few other numerical utility functions available as entry points.
Function:double R_pow (double x
, double y
) ¶
Function:double R_pow_di (double x
, int i
) ¶
Function:double pow1p (double x
, double y
) ¶
: R_pow(x, y)
and R_pow_di(x, i)
compute x^y
and x^i
, respectively using R_FINITE
checks and returning the proper result (the same as R) for the cases where x
, y
or i
are 0 or missing or infinite or NaN
.
`pow1p(x, y)` computes
`(1 + x)^y`, accurately even for small
`x`, i.e., \|x\| \<\< 1.
- Function:double log1p (double
x
) ¶ -
Computes
log(1 + x)
(log 1 plus x), accurately even for smallx
, i.e., |x| << 1.This should be provided by your platform, in which case it is not included in
Rmath.h
, but is (probably) inmath.h
whichRmath.h
includes (except under C++, so it may not be declared for C++98).
- Function:double log1pmx (double
x
) ¶ -
Computes
log(1 + x) - x
(log 1 plus x minus x), accurately even for smallx
, i.e., |x| << 1.
- Function:double log1pexp (double
x
) ¶ -
Computes
log(1 + exp(x))
(log 1 plus exp), accurately, notably for largex
, e.g., x > 720.
- Function:double log1mexp (double
x
) ¶ -
Computes
log(1 - exp(-x))
(log 1 minus exp), accurately, carefully for two regions ofx
, optimally cutting off at log 2 (= 0.693147..), using((-x) > -M_LN2 ? log(-expm1(-x)) : log1p(-exp(-x)))
.
- Function:double expm1 (double
x
) ¶ -
Computes
exp(x) - 1
(exp x minus 1), accurately even for smallx
, i.e., |x| << 1.This should be provided by your platform, in which case it is not included in
Rmath.h
, but is (probably) inmath.h
whichRmath.h
includes (except under C++, so it may not be declared for C++98).
- Function:double lgamma1p (double
x
) ¶ -
Computes
log(gamma(x + 1))
(log(gamma(1 plus x))), accurately even for smallx
, i.e., 0 < x < 0.5.
- Function:double cospi (double
x
) ¶ -
Computes
cos(pi * x)
(wherepi
is 3.14159...), accurately, notably for half integerx
.This might be provided by your platform5, in which case it is not included in
Rmath.h
, but is inmath.h
whichRmath.h
includes. (Ensure that neithermath.h
norcmath
is included beforeRmath.h
or define5 It is an optional C11 extension.
#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1
before the first inclusion.)
- Function:double sinpi (double
x
) ¶ -
Computes
sin(pi * x)
accurately, notably for (half) integerx
.This might be provided by your platform, in which case it is not included in
Rmath.h
, but is inmath.h
whichRmath.h
includes (but see the comments forcospi
).
- Function:double Rtanpi (double
x
) ¶ -
Computes
tan(pi * x)
accurately, notably for integerx
, givingNaN
for half integerx
and exactly +1 or -1 for (non half) quarter integers.
- Function:double tanpi (double
x
) ¶ -
Computes
tan(pi * x)
accurately for integerx
with possibly platform dependent behavior for half (and quarter) integers. This might be provided by your platform, in which case it is not included inRmath.h
, but is inmath.h
whichRmath.h
includes (but see the comments forcospi
).
Function:double logspace_add (double logx
, double logy
) ¶
Function:double logspace_sub (double logx
, double logy
) ¶
Function:double logspace_sum (const double* logx
, int n
) ¶
: Compute the log of a sum or difference from logs of terms, i.e., “x + y” as log (exp(logx) + exp(logy))
and “x - y” as log (exp(logx) - exp(logy))
, and “sum_i x[i]” as log (sum[i = 1:n exp(logx[i])] )
without causing unnecessary overflows or throwing away too much accuracy.
Function:int imax2 (int x
, int y
) ¶
Function:int imin2 (int x
, int y
) ¶
Function:double fmax2 (double x
, double y
) ¶
Function:double fmin2 (double x
, double y
) ¶
: Return the larger (max
) or smaller (min
) of two integer or double numbers, respectively. Note that fmax2
and fmin2
differ from C99/C++11’s fmax
and fmin
when one of the arguments is a NaN
: these versions return NaN
.
- Function:double sign (double
x
) ¶ -
Compute the signum function, where sign(
x
) is 1, 0, or -1, whenx
is positive, 0, or negative, respectively, andNaN
ifx
is aNaN
.
- Function:double fsign (double
x
, doubley
) ¶ -
Performs “transfer of sign” and is defined as |x| * sign(y).
- Function:double fprec (double
x
, doubledigits
) ¶ -
Returns the value of
x
rounded todigits
significant decimal digits.This is the function used by R’s
signif()
.
- Function:double fround (double
x
, doubledigits
) ¶ -
Returns the value of
x
rounded todigits
decimal digits (after the decimal point).This is the function used by R’s
round()
. (Note that C99/C++11 provide around
function but C++98 need not.)
- Function:double ftrunc (double
x
) ¶ -
Returns the value of
x
truncated (to an integer value) towards zero.
6.7.4 Mathematical constants
R has a set of commonly used mathematical constants encompassing constants defined by POSIX and usually found in headers math.h
and cmath
, as well as further ones that are used in statistical computations. These are defined to (at least) 30 digits accuracy in Rmath.h
. The following definitions use ln(x)
for the natural logarithm (log(x)
in R).
Name Definition ( ln = log
)round(value, 7) M_E
e 2.7182818 M_LOG2E
log2(e) 1.4426950 M_LOG10E
log10(e) 0.4342945 M_LN2
ln(2) 0.6931472 M_LN10
ln(10) 2.3025851 M_PI
pi 3.1415927 M_PI_2
pi/2 1.5707963 M_PI_4
pi/4 0.7853982 M_1_PI
1/pi 0.3183099 M_2_PI
2/pi 0.6366198 M_2_SQRTPI
2/sqrt(pi) 1.1283792 M_SQRT2
sqrt(2) 1.4142136 M_SQRT1_2
1/sqrt(2) 0.7071068 M_SQRT_3
sqrt(3) 1.7320508 M_SQRT_32
sqrt(32) 5.6568542 M_LOG10_2
log10(2) 0.3010300 M_2PI
2*pi 6.2831853 M_SQRT_PI
sqrt(pi) 1.7724539 M_1_SQRT_2PI
1/sqrt(2*pi) 0.3989423 M_SQRT_2dPI
sqrt(2/pi) 0.7978846 M_LN_SQRT_PI
ln(sqrt(pi)) 0.5723649 M_LN_SQRT_2PI
ln(sqrt(2*pi)) 0.9189385 M_LN_SQRT_PId2
ln(sqrt(pi/2)) 0.2257914
For compatibility with S this file used to define the constant PI
this is defunct and should be replaced by M_PI
. Header Constants.h
includes either C header float.h
or C++ header cfloat
, which provide constants such as DBL_MAX
.
Further, the included header R_ext/Boolean.h
has enumeration constants TRUE
and FALSE
of type Rboolean
in order to provide a way of using “logical” variables in C consistently. This can conflict with other software: for example it conflicts with the headers in IJG’s jpeg-9
(but not earlier versions).
6.8 Optimization
The C code underlying optim
can be accessed directly. The user needs to supply a function to compute the function to be minimized, of the type
typedef double optimfn(int n, double *par, void *ex);
where the first argument is the number of parameters in the second argument. The third argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
Some of the methods also require a gradient function
typedef void optimgr(int n, double *par, double *gr, void *ex);
which passes back the gradient in the gr
argument. No function is provided for finite-differencing, nor for approximating the Hessian at the result.
The interfaces (defined in header R_ext/Applic.h
) are
Nelder Mead:
void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn, int *fail, double abstol, double intol, void *ex, double alpha, double beta, double gamma, int trace, int *fncount, int maxit);
BFGS:
void vmmin(int n, double *x, double *Fmin, , optimgr gr, int maxit, int trace, optimfn fnint *mask, double abstol, double reltol, int nREPORT, void *ex, int *fncount, int *grcount, int *fail);
Conjugate gradients:
void cgmin(int n, double *xin, double *x, double *Fmin, , optimgr gr, int *fail, double abstol, optimfn fndouble intol, void *ex, int type, int trace, int *fncount, int *grcount, int maxit);
Limited-memory BFGS with bounds:
void lbfgsb(int n, int lmm, double *x, double *lower, double *upper, int *nbd, double *Fmin, optimfn fn, , int *fail, void *ex, double factr, optimgr grdouble pgtol, int *fncount, int *grcount, int maxit, char *msg, int trace, int nREPORT);
Simulated annealing:
void samin(int n, double *x, double *Fmin, optimfn fn, int maxit, int tmax, double temp, int trace, void *ex);
Many of the arguments are common to the various methods. n
is the number of parameters, x
or xin
is the starting parameters on entry and x
the final parameters on exit, with final value returned in Fmin
. Most of the other parameters can be found from the help page for optim
: see the source code src/appl/lbfgsb.c
for the values of nbd
, which specifies which bounds are to be used.
6.9 Integration
The C code underlying integrate
can be accessed directly. The user needs to supply a vectorizing C function to compute the function to be integrated, of the type
typedef void integr_fn(double *x, int n, void *ex);
where x[]
is both input and output and has length n
, i.e., a C function, say fn
, of type integr_fn
must basically do for(i in 1:n) x[i] := f(x[i], ex)
. The vectorization requirement can be used to speed up the integrand instead of calling it n
times. Note that in the current implementation built on QUADPACK, n
will be either 15 or 21. The ex
argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
There are interfaces (defined in header R_ext/Applic.h
) for integrals over finite and infinite intervals (or “ranges” or “integration boundaries”).
Finite:
void Rdqags(integr_fn f, void *ex, double *a, double *b, double *epsabs, double *epsrel, double *result, double *abserr, int *neval, int *ier, int *limit, int *lenw, int *last, int *iwork, double *work);
Infinite:
void Rdqagi(integr_fn f, void *ex, double *bound, int *inf, double *epsabs, double *epsrel, double *result, double *abserr, int *neval, int *ier, int *limit, int *lenw, int *last, int *iwork, double *work);
Only the 3rd and 4th argument differ for the two integrators; for the finite range integral using Rdqags
, a
and b
are the integration interval bounds, whereas for an infinite range integral using Rdqagi
, bound
is the finite bound of the integration (if the integral is not doubly-infinite) and inf
is a code indicating the kind of integration range,
inf = 1
-
corresponds to (bound, +Inf),
inf = -1
-
corresponds to (-Inf, bound),
inf = 2
-
corresponds to (-Inf, +Inf),
f
and ex
define the integrand function, see above; epsabs
and epsrel
specify the absolute and relative accuracy requested, result
, abserr
and last
are the output components value
, abs.err
and subdivisions
of the R function integrate, where neval
gives the number of integrand function evaluations, and the error code ier
is translated to R’s integrate() $ message
, look at that function definition. limit
corresponds to integrate(..., subdivisions = *)
. It seems you should always define the two work arrays and the length of the second one as
= 4 * limit;
lenw = (int *) R_alloc(limit, sizeof(int));
iwork = (double *) R_alloc(lenw, sizeof(double)); work
The comments in the source code in src/appl/integrate.c
give more details, particularly about reasons for failure (ier >= 1
).
6.10 Utility functions
R has a fairly comprehensive set of sort routines which are made available to users’ C code. The following is declared in header file Rinternals.h
.
Function:void R_orderVector (int* indx
, int n
, SEXP arglist
, Rboolean nalast
, Rboolean decreasing
) ¶
Function:void R_orderVector1 (int* indx
, int n
, SEXP x
, Rboolean nalast
, Rboolean decreasing
) ¶
: R_orderVector()
corresponds to R’s order(..., na.last, decreasing)
. More specifically, indx <- order(x, y, na.last, decreasing)
corresponds to R_orderVector(indx, n, Rf_lang2(x, y), nalast, decreasing)
and for three vectors, Rf_lang3(x,y,z)
is used as arglist
.
Both `R_orderVector` and `R_orderVector1` assume the vector `indx`
to be allocated to length \>= n. On return, `indx[]` contains a
permutation of `0:(n-1)`, i.e., 0-based C indices (and not 1-based R
indices, as R's `order()`).
When ordering only one vector, `R_orderVector1` is faster and
corresponds (but is 0-based) to R's
`indx <- order(x, na.last, decreasing)`. It was added in R 3.3.0.
All other sort routines are declared in header file R_ext/Utils.h
(included by R.h
) and include the following.
Function:void R_isort (int* x
, int n
) ¶
Function:void R_rsort (double* x
, int n
) ¶
Function:void R_csort (Rcomplex* x
, int n
) ¶
Function:void rsort_with_index (double* x
, int* index
, int n
) ¶
: The first three sort integer, real (double) and complex data respectively. (Complex numbers are sorted by the real part first then the imaginary part.) NA
s are sorted last.
`rsort_with_index` sorts on `x`, and applies the same
permutation to `index`. `NA`s are sorted last.
- Function:void revsort (double*
x
, int*index
, intn
) ¶ -
Is similar to
rsort_with_index
but sorts into decreasing order, andNA
s are not handled.
Function:void iPsort (int* x
, int n
, int k
) ¶
Function:void rPsort (double* x
, int n
, int k
) ¶
Function:void cPsort (Rcomplex* x
, int n
, int k
) ¶
: These all provide (very) partial sorting: they permute x
so that x[k]
is in the correct place with smaller values to the left, larger ones to the right.
Function:void R_qsort (double *v
, size_t i
, size_t j
) ¶
Function:void R_qsort_I (double *v
, int *I
, int i
, int j
) ¶
Function:void R_qsort_int (int *iv
, size_t i
, size_t j
) ¶
Function:void R_qsort_int_I (int *iv
, int *I
, int i
, int j
) ¶
: These routines sort v[i:j]
or iv[i:j]
(using 1-indexing, i.e., v[1]
is the first element) calling the quicksort algorithm as used by R’s sort(v, method = "quick")
and documented on the help page for the R function sort
. The ..._I()
versions also return the sort.index()
vector in I
. Note that the ordering is not stable, so tied values may be permuted.
Note that `NA`s are not handled (explicitly) and you should use
different sorting functions if `NA`s can be present.
Function:subroutine qsort4 (double precision v
, integer indx
, integer ii
, integer jj
) ¶
Function:subroutine qsort3 (double precision v
, integer ii
, integer jj
) ¶
: The Fortran interface routines for sorting double precision vectors are qsort3
and qsort4
, equivalent to R_qsort
and R_qsort_I
, respectively.
- Function:void R_max_col (double*
matrix
, int*nr
, int*nc
, int*maxes
, int*ties_meth
) ¶ -
Given the
nr
bync
matrixmatrix
in column-major (“Fortran”) order,R_max_col()
returns inmaxes[i-1]
the column number of the maximal element in thei
-th row (the same as R’smax.col()
function). In the case of ties (multiple maxima),*ties_meth
is an integer code in1:3
determining the method: 1 = “random”, 2 = “first” and 3 = “last”. See R’s help page?max.col
.
Function:int findInterval (double* xt
, int n
, double x
, Rboolean rightmost_closed
, Rboolean all_inside
, int ilo
, int* mflag
) ¶
Function:int findInterval2 (double* xt
, int n
, double x
, Rboolean rightmost_closed
, Rboolean all_inside
, Rboolean left_open
, int ilo
, int* mflag
) ¶
: Given the ordered vector xt
of length n
, return the interval or index of x
in xt[]
, typically max(i; 1 <= i <= n
& xt
[i] <= x
) where we use 1-indexing as in R and Fortran (but not C). If rightmost_closed
is true, also returns n
-1 if x
equals xt
[n
]. If all_inside
is not 0, the result is coerced to lie in 1:(n-1)
even when x
is outside the xt
[] range. On return, *mflag
equals -1 if x
< xt
[1], +1 if x
>= xt
[n
], and 0 otherwise.
The algorithm is particularly fast when `ilo` is set to
the last result of `findInterval()` and `x` is a value of
a sequence which is increasing or decreasing for subsequent calls.
`findInterval2()` is a generalization of `findInterval()`, with an
extra `Rboolean` argument `left_open`. Setting
`left_open = TRUE` basically replaces all left-closed right-open
intervals t) by left-open ones t\], see the help page of R function
`findInterval` for details.
There is also an `F77_CALL(interv)()` version of `findInterval()`
with the same arguments, but all pointers.
A system-independent interface to produce the name of a temporary file is provided as
Function:char * R_tmpnam (const char *prefix
, const char *tmpdir
) ¶
Function:char * R_tmpnam2 (const char *prefix
, const char *tmpdir
, const char *fileext
) ¶
Function:void R_free_tmpnam (char *name
) ¶
: Return a pathname for a temporary file with name beginning with prefix
and ending with fileext
in directory tmpdir
. A NULL
prefix or extension is replaced by ""
. Note that the return value is dynamically allocated and should be freed using R_free_tmpnam
when no longer needed (unlike the system call tmpnam
). Freeing the result using free
is no longer recommended.
Function:double R_atof (const char* str
) ¶
Function:double R_strtod (const char* str
, char ** end
) ¶
: Implementations of the C99/POSIX functions atof
and strtod
which guarantee platform- and locale-independent behaviour, including always using the period as the decimal point aka ‘radix character’ and returning R’s NA_REAL_
for all unconverted strings, including "NA"
.
There is also the internal function used to expand file names in several R functions, and called directly by path.expand
.
- Function:const char * R_ExpandFileName (const char *
fn
) ¶ -
Expand a path name
fn
by replacing a leading tilde by the user’s home directory (if defined). The precise meaning is platform-specific; it will usually be taken from the environment variableHOME
if this is defined.
For historical reasons there are Fortran interfaces to functions D1MACH
and I1MACH
. These can be called from C code as e.g. F77_CALL(d1mach)(4)
. Note that these are emulations of the original functions by Fox, Hall and Schryer on Netlib at https://netlib.org/slatec/src/ for IEC 60559 arithmetic (required by R).
6.11 Re-encoding
R has its own C-level interface to the encoding conversion capabilities provided by iconv
because there are incompatibilities between the declarations in different implementations of iconv
.
These are declared in header file R_ext/Riconv.h
.
- Function:void * Riconv_open (const char *
to
, const char *from
) ¶ -
Set up a pointer to an encoding object to be used to convert between two encodings:
""
indicates the current locale.
- Function:size_t Riconv (void *
cd
, const char **inbuf
, size_t *inbytesleft
, char **outbuf
, size_t *outbytesleft
) ¶ -
Convert as much as possible of
inbuf
tooutbuf
. Initially thesize_t
variables indicate the number of bytes available in the buffers, and they are updated (and thechar
pointers are updated to point to the next free byte in the buffer). The return value is the number of characters converted, or(size_t)-1
(beware:size_t
is usually an unsigned type). It should be safe to assume that an error condition setserrno
to one ofE2BIG
(the output buffer is full),EILSEQ
(the input cannot be converted, and might be invalid in the encoding specified) orEINVAL
(the input does not end with a complete multi-byte character).
- Function:int Riconv_close (void *
cd
) ¶ -
Free the resources of an encoding object.
6.12 Condition handling and cleanup code
Three functions are available for establishing condition handlers from within C code:
#include <Rinternals.h>
(SEXP (*fun)(void *data), void *data,
SEXP R_tryCatchError(*hndlr)(SEXP cond, void *hdata), void *hdata);
SEXP
(SEXP (*fun)(void *data), void *data,
SEXP R_tryCatch,
SEXP(*hndlr)(SEXP cond, void *hdata), void *hdata,
SEXP void (*clean)(void *cdata), void *cdata);
(SEXP (*fun)(void *data), void *data,
SEXP R_withCallingErrorHandler(*hndlr)(SEXP cond, void *hdata), void *hdata) SEXP
R_tryCatchError
establishes an exiting handler for conditions inheriting form class error
.
R_tryCatch
can be used to establish a handler for other conditions and to register a cleanup action. The conditions to be handled are specified as a character vector (STRSXP
). A NULL
pointer can be passed as fun
or clean
if condition handling or cleanup are not needed.
These are currently implemented using the R-level tryCatch
mechanism so are subject to some overhead.
R_withCallingErrorHandler
establishes a calling handler for conditions inheriting from class error
. It establishes the handler without calling back into R and will therefore be more efficient.
The function R_UnwindProtect
can be used to ensure that a cleanup action takes place on ordinary return as well as on a non-local transfer of control, which R implements as a longjmp
.
(SEXP (*fun)(void *data), void *data,
SEXP R_UnwindProtectvoid (*clean)(void *data, Rboolean jump), void *cdata,
); SEXP cont
R_UnwindProtect
can be used in two ways. The simper usage, suitable for use in C code, passes NULL
for the cont
argument. R_UnwindProtect
will call fun(data)
. If fun
returns a value, then R_UnwindProtect
calls clean(cleandata, FALSE)
before returning the value returned by fun
. If fun
executes a non-local transfer of control, then clean(cleandata, TRUE)
is called, and the non-local transfer of control is resumed.
The second use pattern, suitable to support C++ stack unwinding, uses two additional functions:
();
SEXP R_MakeUnwindContvoid R_ContinueUnwind(SEXP cont); NORET
R_MakeUnwindCont
allocates a continuation token cont
to pass to R_UnwindProtect
. This token should be protected with PROTECT
before calling R_UnwindProtect
. When the clean
function is called with jump == TRUE
, indicating that R is executing a non-local transfer of control, it can throw a C++ exception to a C++ catch
outside the C++ code to be unwound, and then use the continuation token in the a call R_ContinueUnwind(cont)
to resume the non-local transfer of control within R.
An older interface for the simpler R_MakeUnwindCont
usage remains available:
(SEXP (*fun)(void *), void *data,
SEXP R_ExecWithCleanupvoid (*cleanfun)(void *), void *cleandata);
cleanfun
is called on both regular returns and non-local transfers of control, but without an indication of which form of exit is occurring.
The function R_ToplevelExec
can be used to execute code without allowing any non-local transfers of control, including by user interrupts or invoking abort
restarts.
(void (*fun)(void *), void *data); Rboolean R_ToplevelExec
The return value is TRUE
if fun
returns normally and FALSE
if fun
exits with a jump to top level. fun
is called with a new top-level context. Condition handlers and other features of the current top level context when R_ToplevelExec
is called will not be seen by the code in fun
. Two convenience functions built on R_ToplevelExec
are R_tryEval
and R_tryEvalSilent
.
(SEXP e, SEXP env, int *ErrorOccurred);
SEXP R_tryEval(SEXP e, SEXP env, int *ErrorOccurred) SEXP R_tryEvalSilent
These return a NULL
pointer if evaluating the expression results in a jump to top level.
Using R_ToplevelExec
is usually only appropriate in situations where one might want to run code in a separate thread if that was an option. For example, finalizers are run in a separate top level context. The other functions mentioned in this section will usually be more appropriate choices.
6.13 Allowing interrupts
No part of R can be interrupted whilst running long computations in compiled code, so programmers should make provision for the code to be interrupted at suitable points by calling from C
#include <R_ext/Utils.h>
void R_CheckUserInterrupt(void);
and from Fortran
rchkusr() subroutine
These check if the user has requested an interrupt, and if so branch to R’s error signaling functions.
Note that it is possible that the code behind one of the entry points defined here if called from your C or Fortran code could be interruptible or generate an error and so not return to your code.
6.14 C stack checking
R provides a framework for detecting when the amount of C stack is too low. Two functions are available:
R_CheckStack(void)
void R_CheckStack2(size_t extra) void
These functions signal an error when a low stack condition is detected.
This mechanism is not always available (See Threading issues) and it is best to avoid deep recursions in C and to track recursion depth when using recursion is not avoidable. C compilers will often optimize tail recursions to avoid consuming C stack, so it is best to write code in a tail-recursive form when possible.
6.15 Custom serialization input and output
The internal serialization code uses a framework for serializing from and to different output media. This framework has been in use internally for some time, but its use in packages is highly experimental and may need to be changed or dropped once some experience is gained. Package authors considering using this framework should keep this in mind.
Client code will define a persistent stream structure with declarations like
struct R_outpstream_st out;in; struct R_inpstream_st
These are filled in by calling these functions with appropriate arguments:
void R_InitInPStream(R_inpstream_t stream, R_pstream_data_t data,
,
R_pstream_format_t typeint (*inchar)(R_inpstream_t),
void (*inbytes)(R_inpstream_t, void *, int),
(*phook)(SEXP, SEXP), SEXP pdata);
SEXP void R_InitOutPStream(R_outpstream_t stream, R_pstream_data_t data,
, int version,
R_pstream_format_t typevoid (*outchar)(R_outpstream_t, int),
void (*outbytes)(R_outpstream_t, void *, int),
(*phook)(SEXP, SEXP), SEXP pdata); SEXP
Code should not depend on the fields of the stream structures. Simpler initializers are available for serializing to or from a file pointer:
void R_InitFileOutPStream(R_outpstream_t stream, FILE *fp,
, int version,
R_pstream_format_t type(*phook)(SEXP, SEXP), SEXP pdata);
SEXP void R_InitFileInPStream(R_inpstream_t stream, FILE *fp,
,
R_pstream_format_t type(*phook)(SEXP, SEXP), SEXP pdata); SEXP
Once the stream structures are set up they can be used by calling
void R_Serialize(SEXP s, R_outpstream_t stream)
(R_inpstream_t stream) SEXP R_Unserialize
Examples can be found in the R sources in src/main/serialize.c
.
6.16 Platform and version information
The header files define USING_R
, which can be used to test if the code is indeed being used with R.
Header file Rconfig.h
(included by R.h
) is used to define platform-specific macros that are mainly for use in other header files. The macro WORDS_BIGENDIAN
is defined on big-endian6 systems (e.g. most OSes on Sparc and PowerPC hardware) and not on little-endian systems (nowadays all the commoner R platforms). It can be useful when manipulating binary files. NB: these macros apply only to the C compiler used to build R, not necessarily to another C or C++ compiler.
Header file Rversion.h
(not included by R.h
) defines a macro R_VERSION
giving the version number encoded as an integer, plus a macro R_Version
to do the encoding. This can be used to test if the version of R is late enough, or to include back-compatibility features. For protection against very old versions of R which did not have this macro, use a construction such as
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
...
#endif
More detailed information is available in the macros R_MAJOR
, R_MINOR
, R_YEAR
, R_MONTH
and R_DAY
: see the header file Rversion.h
for their format. Note that the minor version includes the patch level (as in 2.2
).
Packages which use alloca
need to ensure it is defined: as it is part of neither C nor POSIX there is no standard way to do so. One can use
#include <Rconfig.h> // for HAVE_ALLOCA_H
#ifdef __GNUC__
// this covers gcc, clang, icc
# undef alloca
# define alloca(x) __builtin_alloca((x))
#elif defined(HAVE_ALLOCA_H)
// needed for native compilers on Solaris and AIX
# include <alloca.h>
#endif
(and this should be included before standard C headers such as stdlib.h
, since on some platforms these include malloc.h
which may have a conflicting definition), which suffices for known R platforms.
6.17 Inlining C functions
The C99 keyword inline
should be recognized by all compilers nowadays used to build R. Portable code which might be used with earlier versions of R can be written using the macro R_INLINE
(defined in file Rconfig.h
included by R.h
), as for example from package cluster
#include <R.h>
static R_INLINE int ind_2(int l, int j)
{
...
}
Be aware that using inlining with functions in more than one compilation unit is almost impossible to do portably, see https://www.greenend.org.uk/rjk/tech/inline.html, so this usage is for static
functions as in the example. All the R configure code has checked is that R_INLINE
can be used in a single C file with the compiler used to build R. We recommend that packages making extensive use of inlining include their own configure code.
6.18 Controlling visibility
Header R_ext/Visibility.h
has some definitions for controlling the visibility of entry points. These are only effective when HAVE_VISIBILITY_ATTRIBUTE
is defined – this is checked when R is configured and recorded in header Rconfig.h
(included by R_ext/Visibility.h
). It is often defined on modern Unix-alikes with a recent compiler7 but not supported on Windows. Minimizing the visibility of symbols in a shared library will both speed up its loading (unlikely to be significant) and reduce the possibility of linking to other entry points of the same name.
7 Not pre-2023 Intel nor AIX nor Solaris compilers.
8 This applies to the compiler for the default C++ dialect and not necessarily to other dialects.
9 In many cases Fortran compilers accept the flag but do not actually hide their symbols: at the time of writing that was true of gfortran
, flang-new
and Intel’s ifx
.
C/C++ entry points prefixed by attribute_hidden
will not be visible in the shared object. There is no comparable mechanism for Fortran entry points, but there is a more comprehensive scheme used by, for example package stats. Most compilers which allow control of visibility will allow control of visibility for all symbols via a flag, and where known the flag is encapsulated in the macros C_VISIBILITY
, CXX_VISIBILITY
8 and F_VISIBILITY
for C, C++ and Fortran compilers.9 These are defined in etc/Makeconf
and so available for normal compilation of package code. For example, src/Makevars
could include some of
=$(C_VISIBILITY)
PKG_CFLAGS=$(CXX_VISIBILITY)
PKG_CXXFLAGS=$(F_VISIBILITY) PKG_FFLAGS
This would end up with no visible entry points, which would be pointless. However, the effect of the flags can be overridden by using the attribute_visible
prefix. A shared object which registers its entry points needs only for have one visible entry point, its initializer, so for example package stats has
void attribute_visible R_init_stats(DllInfo *dll)
{
(dll, CEntries, CallEntries, FortEntries, NULL);
R_registerRoutines(dll, FALSE);
R_useDynamicSymbols...
}
Because the C_VISIBILITY
mechanism is only useful in conjunction with attribute_visible
, it is not enabled unless HAVE_VISIBILITY_ATTRIBUTE
is defined. The usual visibility flag is -fvisibility=hidden
: some compilers also support -fvisibility-inlines-hidden
which can be used by overriding C_VISIBILITY
and CXX_VISIBILITY
in config.site
when building R, or editing etc/Makeconf
in the R installation.
Note that configure
only checks that visibility attributes and flags are accepted, not that they actually hide symbols.
The visibility mechanism is not available on Windows, but there is an equally effective way to control which entry points are visible, by supplying a definitions file pkgnme/src/pkgname-win.def
: only entry points listed in that file will be visible. Again using stats as an example, it has
.dll
LIBRARY stats
EXPORTS R_init_stats
6.19 Using these functions in your own C code
It is possible to build Mathlib
, the R set of mathematical functions documented in Rmath.h
, as a standalone library libRmath
under both Unix-alikes and Windows. (This includes the functions documented in Numerical analysis subroutines as from that header file.)
The library is not built automatically when R is installed, but can be built in the directory src/nmath/standalone
in the R sources: see the file README
there. To use the code in your own C program include
#define MATHLIB_STANDALONE
#include <Rmath.h>
and link against -lRmath
(and perhaps -lm
). There is an example file test.c
.
A little care is needed to use the random-number routines. You will need to supply the uniform random number generator
double unif_rand(void)
or use the one supplied (and with a dynamic library or DLL you will have to use the one supplied, which is the Marsaglia-multicarry with an entry points
(unsigned int, unsigned int) set_seed
to set its seeds and
(unsigned int *, unsigned int *) get_seed
to read the seeds).
6.20 Organization of header files
The header files which R installs are in directory R_INCLUDE_DIR
(default R_HOME/include
). This currently includes
R.h
includesmany other files Rinternals.h
definitions for using R’s internal structures Rdefines.h
macros for an S-like interface to the above (no longer maintained) Rmath.h
standalone math library Rversion.h
R versioninformation Rinterface.h
for add-on front-ends (Unix-alikes only) Rembedded.h
for add-on front-ends R_ext/Applic.h
optimization, integration and some LAPACK ones) R_ext/BLAS.h
C definitions for BLAS routines R_ext/Callbacks.h
C (and Rfunction) top-level task handlers R_ext/GetX11Image.h
X11Imageinterface used by package trkplot R_ext/Lapack.h
C definitions for some LAPACK routines R_ext/Linpack.h
C definitions for some LINPACK routines, not all of which are included in R R_ext/Parse.h
a small part of R’s parse interface: not part of the stable API. R_ext/RStartup.h
for add-on front-ends R_ext/Rdynload.h
needed toregister compiled code in packages R_ext/Riconv.h
interfaceto iconv
R_ext/Visibility.h
definitions controlling visibility R_ext/eventloop.h
for add-on front-ends and for packages that need to share in the R event loops (not Windows)
The following headers are included by R.h
:
Rconfig.h
configuration info that is made available R_ext/Arith.h
handlingfor NA
s,NaN
s,Inf
/-Inf
R_ext/Boolean.h
TRUE
/`FALSE` type R_ext/Complex.h
C typedefs for R’s complex
R_ext/Constants.h
constantsR_ext/Error.h
error signaling R_ext/Memory.h
memory allocation R_ext/Print.h
Rprintf
and variations. R_ext/RS.h
definitions common to R.h
and the formerS.h
, includingF77_CALL
etc.R_ext/Random.h
random number generation R_ext/Utils.h
sorting and other utilities R_ext/libextern.h
definitions for exports from R.dll
on Windows.
The graphics systems are exposed in headers R_ext/GraphicsEngine.h
, R_ext/GraphicsDevice.h
(which it includes) and R_ext/QuartzDevice.h
. Facilities for defining custom connection implementations are provided in R_ext/Connections.h
, but make sure you consult the file before use.
Let us re-iterate the advice to include in C++ code system headers before the R header files, especially Rinternals.h
(included by Rdefines.h
) and Rmath.h
, which redefine names which may be used in system headers, or (preferably and now the default) to define R_NO_REMAP
.
6.21 Moving into C API compliance
Work is in progress to clarify and tighten the C API for extending R code. This will help make package C code more robust, and will facilitate maintaining and improving the R source code without impacting package space. In the process a number of entry points intended for internal use will be removed from installed header files or hidden, and others will be replaced by more robust versions better suited for use in package C code. This section describes how packages can move from using non-API entry points to using ones available and supported in the API.
Work in progress: This section is a work in progress and will be adjusted as changes are made to the API.
6.21.1 Some API replacements for non-API entry points
Some non-API entry points intended for internal use have long had entry points in the API that can be used instead. In other cases new entry point have been added that are more appropriate for use in packages; typically these include more extensive error checking on arguments.
This table lists some non-API functions used in packages and the API functions that should be used instead:
EXTPTR_PROT
EXTPTR_TAG
EXTPTR_PTR
-
Use
R_ExternalPtrProtected
,R_ExternalPtrTag
, andR_ExternalPtrAddr
. OBJECT
IS_S4_OBJECT
-
Use
isObject
andisS4
. GetOption
-
Use
GetOption1
. R_lsInternal
-
Use
R_lsInternal3
. REAL0
COMPLEX0
-
Use
REAL
andCOMPLEX
. STRING_PTR
DATAPTR
STDVEC_DATAPTR
-
Use
STRING_PTR_RO
andDATAPTR_RO
. Obtaining writable pointers to these data can violate the memory manager’s integrity assumptions and is not supported. isFrame
-
Use
isDataFrame
, added in R 4.5.0. BODY
FORMALS
CLOENV
-
Use
R_ClosureBody
,R_ClosureFormals
, andR_ClosureEnv
; these were added in R 4.5.0. ENCLOS
-
Use
R_ParentEnv
, added in R 4.5.0. IS_ASCII
-
Use
charIsASCII
, added in R 4.5.0. IS_UTF8
-
Use
charIsUTF8
, added in R 4.5.0, or avoid completely. allocSExp
-
Use an appropriate constructor.
findVarInFrame3
-
Use
R_existsVarInFrame
to test for existence. findVar
findVarInFrame
-
Use
R_getVar
orR_getVarEx
, added in R 4.5.0. In some cases usingeval
may suffice. ATTRIB
-
Use
getAttrib
for individual attributes. To test whether there are any attributes useANY_ATTRIB
, added in R 4.5.0. SET_ATTRIB
SET_OBJECT
-
Use
setAttrib
for individual attributes,DUPLICATE_ATTRIB
orSHALLOW_DUPLICATE_ATTRIB
for copying attributes from one object to another. UseCLEAR_ATTRIB
for removing all attributes, added in R 4.5.0. R_GetCurrentEnv
-
Use
environment()
at the R level and pass the result as an argument to your C function.
For recently added entry points packages that need to be compiled under older versions that do not yet contain these entry points can use back-ported versions defined conditionally. See Some backports.
6.21.2 Creating environments
An idiom appearing in a number of packages is to create an environment as
= allocSExp(ENVSXP);
SEXP env (env, parent); SET_ENCLOS
The function allocSExp
and mutation functions like SET_ENCLOS
, SET_FRAME
, and SET_HASHTAB
are not part of the API as they expose internal structure that might need to change in the future. A proper constructor function should be used instead. The constructor function for environments is R_NewEnv
, so the new environment should be created as
= R_NewEnv(parent, FALSE, 0); SEXP env
6.21.3 Creating call expressions
Another idiom used in some packages is to create a call expression with space for two arguments as
= allocList(3);
SEXP expr (expr, "LANGSXP"); SET_TYPEOF
and then fill in the function and argument expressions. SET_TYPEOF
will also not be available to packages in the future. An alternative way to construct the expression that will work in any R version is
= LCONS(R_NilValue, allocList(2)); SEXP expr
R 4.4.1 adds the constructor allocLang
, so the expression can be created as
= allocLang(3); SEXP env
6.21.4 Creating closures
Yet another common idiom is to create a new closure as
= Rf_allocSExp(CLOSXP);
SEXP fun (fun, formals);
SET_FORMALS(fun, body);
SET_BODY(fun, env); SET_CLOENV
R 4.5.0 adds the constructor R_mkClosure
; this can be used as
= R_mkClosure(formals, body, env); SEXP fun
6.21.5 Querying CHARSXP
encoding
A number of packages query encoding bits set on CHARSXP
objects via macros IS_ASCII
and IS_UTF8
, some packages also via IS_BYTES
and IS_LATIN1
. These macros are not part of the API and packages have been copying their definition and directly accessing the bits in memory. The structure of the object header is, however, internal to R and may have to change in the future.
IS_ASCII
can be replaced by charIsASCII
, added in R 4.5.0. It can also be replaced by code that checks individual characters (bytes).
Information provided by the other macros is available via function getCharCE
, which has been part of the API since R 2.7.0. Before switching to getCharCE
, packages are, however, advised to check whether the encoding information is really needed and whether it is used correctly.
Most code should be able to work with complete CHARSXP
s and never look at the individual bytes. When access to characters and bytes (of strings other than CE_BYTES
) is needed, one would use translateChar
or translateCharUTF8
. These functions internally already check the encoding and whether the string is ASCII and only translate when needed, which should be rarely since R >= 4.2.0 (UTF-8 is used as native encoding on most systems running R).
Several packages use the encoding information to find out whether an internal string representation visible via CHAR
is UTF-8 or latin1. R 4.5.0 provides functions charIsUTF8
and charIsLatin1
for this purpose, which are safer against future changes and handle also native strings when running in the corresponding locale. Note that both will be true for ASCII strings.
A pattern used in several packages is
char *asutf8(SEXP c)
{
if (!IS_UTF8(s) && !IS_ASCII(s)) // not compliant
return Rf_translateCharUTF8(s);
else
return CHAR(s);
}
to make this code compliant, simply call
char *asutf8(SEXP c)
{
return Rf_translateCharUTF8(s); // compliant
}
as the encoding flags are already checked in translateCharUTF8
. Also note the non-compliant check doesn’t handle native encoding.
6.21.6 Working with attributes
The current implementation (R 4.5.0) represents attributes internally as a linked list. It may be useful to change this at some point, so external code should not rely on this representation. The low-level functions ATTRIB
and SET_ATTRIB
reveal this representation and are therefore not part of the API. Individual attributes can be accessed and set with getAttrib
and setAttrib
. Attributes can be copied from one object to another with DUPLICATE_ATTRIB
and SHALLOW_DUPLICATE_ATTRIB
. The CLEAR_ATTRIB
function added in R 4.5.0 can be used to remove all attributes. These functions ensure can that certain consistency requirements are maintained, such as setting the object bit according to whether a class attribute is present.
Some additional functions may be added for working with attributes.
6.21.7 Working variable bindings
The functions findVar
and findVarInFrame
have been used in a number of packages but are too low level to be part of the API. For most uses the functions R_getVar
and R_getVarEx
added in R 4.5.0 will be sufficient. These are analogous to the R functions get
and get0
.
In rare cases package R or C code may want to obtain more detailed information on a binding, such as whether the binding is delayed or not. This is currently not possible within the API, but is under consideration.
6.21.8 Some backports
This section lists backports of recently added definitions that can be used in packages that need to be compiled under older versions of R that do not yet contain these entry points.
#if R_VERSION < R_Version(4, 4, 1)
(int n)
SEXP allocLang{
if (n > 0)
return LCONS(R_NilValue, allocList(n - 1));
else
return R_NilValue;
}
#endif
#if R_VERSION < R_Version(4, 5, 0)
# define isDataFrame(x) isFrame(x)
# define R_ClosureFormals(x) FORMALS(x)
# define R_ClosureEnv(x) CLOENV(x)
# define R_ParentEnv(x) ENCLOS(x)
(SEXP formals, SEXP body, SEXP env)
SEXP R_mkClosure{
= Rf_allocSExp(CLOSXP);
SEXP fun (fun, formals);
SET_FORMALS(fun, body);
SET_BODY(fun, env);
SET_CLOENVreturn fun;
}
void CLEAR_ATTRIB(SEXP x)
{
(x, R_NilValue);
SET_ATTRIB(x, 0);
SET_OBJECT(x);
UNSET_S4_OBJECT}
#endif
Footnotes