Variational inference searches for the distribution within some family of approximate posteriors that minimizes a divergence between the approximate posterior q(z) and true posterior p(z|observed_time_series). By converting inference to optimization, it's generally much faster than sampling-based inference algorithms such as HMC. The tradeoff is that the approximating family rarely contains the true posterior, so it may miss important aspects of posterior structure (in particular, dependence between variables) and should not be blindly trusted. Results may vary; it's generally wise to compare to HMC to evaluate whether inference quality is sufficient for your task at hand.

  init_batch_shape = list(),
  seed = NULL,
  name = NULL



float tensor of shape concat([sample_shape, model.batch_shape, [num_timesteps, 1]]) where sample_shape corresponds to i.i.d. observations, and the trailing [1] dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.


An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape [b1, ..., bN].


Batch shape (list) of initial states to optimize in parallel. Default value: list(). (i.e., just run a single optimization).


integer to seed the random number generator.


name prefixed to ops created by this function. Default value: NULL (i.e., 'build_factored_variational_loss').


list of:

  • variational_loss: float Tensor of shape tf$concat([init_batch_shape, model$batch_shape]), encoding a stochastic estimate of an upper bound on the negative model evidence -log p(y). Minimizing this loss performs variational inference; the gap between the variational bound and the true (generally unknown) model evidence corresponds to the divergence KL[q||p] between the approximate and true posterior.

  • variational_distributions: a named list giving the approximate posterior for each model parameter. The keys are character parameter names in order, corresponding to [ for param in model.parameters]. The values are tfd$Distribution instances with batch shape tf$concat([init_batch_shape, model$batch_shape]); these will typically be of the form tfd$TransformedDistribution(tfd.Normal(...), bijector=param.bijector).


This method constructs a loss function for variational inference using the Kullback-Liebler divergence KL[q(z) || p(z|observed_time_series)], with an approximating family given by independent Normal distributions transformed to the appropriate parameter space for each parameter. Minimizing this loss (the negative ELBO) maximizes a lower bound on the log model evidence -log p(observed_time_series). This is equivalent to the 'mean-field' method implemented in Kucukelbir et al. (2017) and is a standard approach. The resulting posterior approximations are unimodal; they will tend to underestimate posterior uncertainty when the true posterior contains multiple modes (the KL[q||p] divergence encourages choosing a single mode) or dependence between variables.


See also