R/stsfunctions.R
sts_build_factored_variational_loss.Rd
Variational inference searches for the distribution within some family of
approximate posteriors that minimizes a divergence between the approximate
posterior q(z)
and true posterior p(zobserved_time_series)
. By converting
inference to optimization, it's generally much faster than samplingbased
inference algorithms such as HMC. The tradeoff is that the approximating
family rarely contains the true posterior, so it may miss important aspects of
posterior structure (in particular, dependence between variables) and should
not be blindly trusted. Results may vary; it's generally wise to compare to
HMC to evaluate whether inference quality is sufficient for your task at hand.
sts_build_factored_variational_loss( observed_time_series, model, init_batch_shape = list(), seed = NULL, name = NULL )
observed_time_series 


model  An instance of 
init_batch_shape  Batch shape ( 
seed  integer to seed the random number generator. 
name  name prefixed to ops created by this function. Default value: 
list of:
variational_loss: float
Tensor
of shape
tf$concat([init_batch_shape, model$batch_shape])
, encoding a stochastic
estimate of an upper bound on the negative model evidence log p(y)
.
Minimizing this loss performs variational inference; the gap between the
variational bound and the true (generally unknown) model evidence
corresponds to the divergence KL[qp]
between the approximate and true
posterior.
variational_distributions: a named list giving
the approximate posterior for each model parameter. The keys are
character
parameter names in order, corresponding to
[param.name for param in model.parameters]
. The values are
tfd$Distribution
instances with batch shape
tf$concat([init_batch_shape, model$batch_shape])
; these will typically be
of the form tfd$TransformedDistribution(tfd.Normal(...), bijector=param.bijector)
.
This method constructs a loss function for variational inference using the
KullbackLiebler divergence KL[q(z)  p(zobserved_time_series)]
, with an
approximating family given by independent Normal distributions transformed to
the appropriate parameter space for each parameter. Minimizing this loss (the
negative ELBO) maximizes a lower bound on the log model evidence
log p(observed_time_series)
. This is equivalent to the 'meanfield' method
implemented in Kucukelbir et al. (2017) and is a standard approach.
The resulting posterior approximations are unimodal; they will tend to underestimate posterior
uncertainty when the true posterior contains multiple modes
(the KL[qp]
divergence encourages choosing a single mode) or dependence between variables.
Other stsfunctions:
sts_build_factored_surrogate_posterior()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()