TensorFlow Probability is a library for statistical computation and probabilistic modeling built on top of TensorFlow.

Its building blocks include a vast range of distributions and invertible transformations (*bijectors*), probabilistic layers that may be used in `keras`

models, and tools for probabilistic reasoning including variational inference and Markov Chain Monte Carlo.

To install `tfprobability`

from github, do

`devtools::install_github("rstudio/tfprobability")`

TensorFlow Probability depends on TensorFlow, and in the same way, `tfprobability`

depends on a working installation of the R packages `tensorflow`

and `keras`

. To get the most up-to-date versions of these packages, install them from github as well:

```
devtools::install_github("rstudio/tensorflow")
devtools::install_github("rstudio/keras")
```

As to the Python backend, if you do

```
library(tensorflow)
install_tensorflow()
```

you will automatically get the current stable version of TensorFlow Probability together with TensorFlow. Correspondingly, if you need nightly builds,

`install_tensorflow(version = "nightly")`

will get you the nightly build of TensorFlow as well as TensorFlow Probability.

High-level application of `tfprobability`

to tasks like

- probabilistic (multi-level) modeling with MCMC and/or variational inference,
- uncertainty estimation for neural networks,
- time series modeling with state space models, or
- density estimation with autoregressive flows

are described in the vignettes/articles and/or featured on the TensorFlow for R blog.

This introductory text illustrates the lower-level building blocks: distributions, bijectors, and probabilistic `keras`

layers.

```
library(tfprobability)
library(tensorflow)
tf$compat$v2$enable_v2_behavior()
```

Distributions are objects with methods to compute summary statistics, (log) probability, and (optionally) quantities like entropy and KL divergence.

```
# create a binomial distribution with n = 7 and p = 0.3
d <- tfd_binomial(total_count = 7, probs = 0.3)
# compute mean
d %>% tfd_mean()
#> tf.Tensor(2.1000001, shape=(), dtype=float32)
# compute variance
d %>% tfd_variance()
#> tf.Tensor(1.47, shape=(), dtype=float32)
# compute probability
d %>% tfd_prob(2.3)
#> tf.Tensor(0.30379143, shape=(), dtype=float32)
```

Bijectors are invertible transformations that allow to derive data likelihood under the transformed distribution from that under the base distribution. For an in-detail explanation, see Getting into the flow: Bijectors in TensorFlow Probability on the TensorFlow for R blog.

```
# create an affine transformation that shifts by 3.33 and scales by 0.5
b <- tfb_affine_scalar(shift = 3.33, scale = 0.5)
# apply the transformation
x <- c(100, 1000, 10000)
b %>% tfb_forward(x)
#> tf.Tensor([ 53.33 503.33 5003.33], shape=(3,), dtype=float32)
```

`tfprobality`

wraps distributions in Keras layers so we can use them seemlessly in a neural network, and work with tensors as targets as usual. For example, we can use `layer_kl_divergence_add_loss`

to have the network take care of the KL loss automatically, and train a variational autoencoder with just negative log likelihood only, like this:

```
library(keras)
encoded_size <- 2
input_shape <- c(2L, 2L, 1L)
train_size <- 100
x_train <- array(runif(train_size * Reduce(`*`, input_shape)), dim = c(train_size, input_shape))
# encoder is a keras sequential model
encoder_model <- keras_model_sequential() %>%
layer_flatten(input_shape = input_shape) %>%
layer_dense(units = 10, activation = "relu") %>%
layer_dense(units = params_size_multivariate_normal_tri_l(encoded_size)) %>%
layer_multivariate_normal_tri_l(event_size = encoded_size) %>%
# last layer adds KL divergence loss
layer_kl_divergence_add_loss(
distribution = tfd_independent(
tfd_normal(loc = c(0, 0), scale = 1),
reinterpreted_batch_ndims = 1
),
weight = train_size)
# decoder is a keras sequential model
decoder_model <- keras_model_sequential() %>%
layer_dense(units = 10,
activation = 'relu',
input_shape = encoded_size) %>%
layer_dense(params_size_independent_bernoulli(input_shape)) %>%
layer_independent_bernoulli(event_shape = input_shape,
convert_to_tensor_fn = tfp$distributions$Bernoulli$logits)
# keras functional model uniting them both
vae_model <- keras_model(inputs = encoder_model$inputs,
outputs = decoder_model(encoder_model$outputs[1]))
# VAE loss now is just log probability of the data
vae_loss <- function (x, rv_x)
- (rv_x %>% tfd_log_prob(x))
vae_model %>% compile(
optimizer = "adam",
loss = vae_loss
)
vae_model %>% fit(x_train, x_train, batch_size = 25, epochs = 1)
```