TensorFlow Probability is a library for statistical computation and probabilistic modeling built on top of TensorFlow.
Its building blocks include a vast range of distributions and invertible transformations (bijectors), probabilistic layers that may be used in
keras models, and tools for probabilistic reasoning including variational inference and Markov Chain Monte Carlo.
tfprobability from github, do
TensorFlow Probability depends on TensorFlow, and in the same way,
tfprobability depends on a working installation of the R packages
keras. To get the most up-to-date versions of these packages, install them from github as well:
As to the Python backend, if you do
you will automatically get the current stable version of TensorFlow Probability together with TensorFlow. Correspondingly, if you need nightly builds,
will get you the nightly build of TensorFlow as well as TensorFlow Probability.
High-level application of
tfprobability to tasks like
are described in the vignettes/articles and/or featured on the TensorFlow for R blog.
This introductory text illustrates the lower-level building blocks: distributions, bijectors, and probabilistic
Distributions are objects with methods to compute summary statistics, (log) probability, and (optionally) quantities like entropy and KL divergence.
# create a binomial distribution with n = 7 and p = 0.3 d <- tfd_binomial(total_count = 7, probs = 0.3) # compute mean d %>% tfd_mean() #> tf.Tensor(2.1000001, shape=(), dtype=float32) # compute variance d %>% tfd_variance() #> tf.Tensor(1.47, shape=(), dtype=float32) # compute probability d %>% tfd_prob(2.3) #> tf.Tensor(0.30379143, shape=(), dtype=float32)
Bijectors are invertible transformations that allow to derive data likelihood under the transformed distribution from that under the base distribution. For an in-detail explanation, see Getting into the flow: Bijectors in TensorFlow Probability on the TensorFlow for R blog.
tfprobality wraps distributions in Keras layers so we can use them seemlessly in a neural network, and work with tensors as targets as usual. For example, we can use
layer_kl_divergence_add_loss to have the network take care of the KL loss automatically, and train a variational autoencoder with just negative log likelihood only, like this:
library(keras) encoded_size <- 2 input_shape <- c(2L, 2L, 1L) train_size <- 100 x_train <- array(runif(train_size * Reduce(`*`, input_shape)), dim = c(train_size, input_shape)) # encoder is a keras sequential model encoder_model <- keras_model_sequential() %>% layer_flatten(input_shape = input_shape) %>% layer_dense(units = 10, activation = "relu") %>% layer_dense(units = params_size_multivariate_normal_tri_l(encoded_size)) %>% layer_multivariate_normal_tri_l(event_size = encoded_size) %>% # last layer adds KL divergence loss layer_kl_divergence_add_loss( distribution = tfd_independent( tfd_normal(loc = c(0, 0), scale = 1), reinterpreted_batch_ndims = 1 ), weight = train_size) # decoder is a keras sequential model decoder_model <- keras_model_sequential() %>% layer_dense(units = 10, activation = 'relu', input_shape = encoded_size) %>% layer_dense(params_size_independent_bernoulli(input_shape)) %>% layer_independent_bernoulli(event_shape = input_shape, convert_to_tensor_fn = tfp$distributions$Bernoulli$logits) # keras functional model uniting them both vae_model <- keras_model(inputs = encoder_model$inputs, outputs = decoder_model(encoder_model$outputs)) # VAE loss now is just log probability of the data vae_loss <- function (x, rv_x) - (rv_x %>% tfd_log_prob(x)) vae_model %>% compile( optimizer = "adam", loss = vae_loss ) vae_model %>% fit(x_train, x_train, batch_size = 25, epochs = 1)