This model defines a time series given by a sparse linear combination of covariate time series provided in a design matrix:
sts_sparse_linear_regression( design_matrix, weights_prior_scale = 0.1, weights_batch_shape = NULL, name = NULL )
the name of this model component. Default value: 'LinearRegression'.
an instance of
observed_time_series <- tf$matmul(design_matrix, weights)
This is identical to
sts_linear_regression, except that
sts_sparse_linear_regression uses a parameterization of a Horseshoe
prior to encode the assumption that many of the
weights are zero,
i.e., many of the covariate time series are irrelevant. See the mathematical
details section below for further discussion. The prior parameterization used
sts_sparse_linear_regression is more suitable for inference than that
obtained by simply passing the equivalent
tfd_horseshoe prior to
sts_linear_regression; when sparsity is desired,
likely yield better results.
This component does not itself include observation noise; it defines a
deterministic distribution with mass at the point
tf$matmul(design_matrix, weights). In practice, it should be combined with
observation noise from another component such as
The basic horseshoe prior Carvalho et al. (2009) is defined as a Cauchy-normal scale mixture:
scales[i] ~ HalfCauchy(loc=0, scale=1) weights[i] ~ Normal(loc=0., scale=scales[i] * global_scale)`
The Cauchy scale parameters puts substantial mass near zero, encouraging
weights to be sparse, but their heavy tails allow weights far from zero to be
estimated without excessive shrinkage. The horseshoe can be thought of as a
continuous relaxation of a traditional 'spike-and-slab' discrete sparsity
prior, in which the latent Cauchy scale mixes between 'spike'
scales[i] ~= 0) and 'slab' (
scales[i] >> 0) regimes.
Following the recommendations in Piironen et al. (2017),
a horseshoe with the following adaptations:
The Cauchy prior on
scales[i] is represented as an InverseGamma-Normal
global_scale parameter is integrated out following a
Cauchy(0., scale=weights_prior_scale) hyperprior, which is also represented as an
All compound distributions are implemented using a non-centered parameterization. The compound, non-centered representation defines the same marginal prior as the original horseshoe (up to integrating out the global scale), but allows samplers to mix more efficiently through the heavy tails; for variational inference, the compound representation implicity expands the representational power of the variational model.
Note that we do not yet implement the regularized ('Finnish') horseshoe, proposed in Piironen et al. (2017) for models with weak likelihoods, because the likelihood in STS models is typically Gaussian, where it's not clear that additional regularization is appropriate. If you need this functionality, please email email@example.com.
The full prior parameterization implemented in
Sample global_scale from Cauchy(0, scale=weights_prior_scale). global_scale_variance ~ InverseGamma(alpha=0.5, beta=0.5) global_scale_noncentered ~ HalfNormal(loc=0, scale=1) global_scale = (global_scale_noncentered * sqrt(global_scale_variance) * weights_prior_scale) Sample local_scales from Cauchy(0, 1). local_scale_variances[i] ~ InverseGamma(alpha=0.5, beta=0.5) local_scales_noncentered[i] ~ HalfNormal(loc=0, scale=1) local_scales[i] = local_scales_noncentered[i] * sqrt(local_scale_variances[i]) weights[i] ~ Normal(loc=0., scale=local_scales[i] * global_scale)