Marginal distribution of a Student's T process at finitely many points

A Student's T process (TP) is an indexed collection of random variables, any finite collection of which are jointly Multivariate Student's T. While this definition applies to finite index sets, it is typically implicit that the index set is infinite; in applications, it is often some finite dimensional real or complex vector space. In such cases, the TP may be thought of as a distribution over (real- or complex-valued) functions defined over the index set.

tfd_student_t_process(
  df,
  kernel,
  index_points,
  mean_fn = NULL,
  jitter = 1e-06,
  validate_args = FALSE,
  allow_nan_stats = FALSE,
  name = "StudentTProcess"
)

Arguments

df	Positive Floating-point `Tensor` representing the degrees of freedom. Must be greater than 2.
kernel	`PositiveSemidefiniteKernel`-like instance representing the TP's covariance function.
index_points	`float` `Tensor` representing finite (batch of) vector(s) of points in the index set over which the TP is defined. Shape has the form `[b1, ..., bB, e, f1, ..., fF]` where `F` is the number of feature dimensions and must equal `kernel.feature_ndims` and `e` is the number (size) of index points in each batch. Ultimately this distribution corresponds to a `e`-dimensional multivariate Student's T. The batch shape must be broadcastable with `kernel.batch_shape` and any batch dims yielded by `mean_fn`.
mean_fn	Function that acts on `index_points` to produce a (batch of) vector(s) of mean values at `index_points`. Takes a `Tensor` of shape `[b1, ..., bB, f1, ..., fF]` and returns a `Tensor` whose shape is broadcastable with `[b1, ..., bB]`. Default value: `NULL` implies constant zero function.
jitter	`float` scalar `Tensor` added to the diagonal of the covariance matrix to ensure positive definiteness of the covariance matrix. Default value: `1e-6`.
validate_args	Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.
allow_nan_stats	Logical, default TRUE. When TRUE, statistics (e.g., mean, mode, variance) use the value NaN to indicate the result is undefined. When FALSE, an exception is raised if one or more of the statistic's batch members are undefined.
name	name prefixed to Ops created by this class.

Value

a distribution instance.

Details

Just as Student's T distributions are fully specified by their degrees of freedom, location and scale, a Student's T process can be completely specified by a degrees of freedom parameter, mean function and covariance function.

Let S denote the index set and K the space in which each indexed random variable takes its values (again, often R or C). The mean function is then a map m: S -> K, and the covariance function, or kernel, is a positive-definite function k: (S x S) -> K. The properties of functions drawn from a TP are entirely dictated (up to translation) by the form of the kernel function.

This Distribution represents the marginal joint distribution over function values at a given finite collection of points [x[1], ..., x[N]] from the index set S. By definition, this marginal distribution is just a multivariate Student's T distribution, whose mean is given by the vector [ m(x[1]), ..., m(x[N]) ] and whose covariance matrix is constructed from pairwise applications of the kernel function to the given inputs:

| k(x[1], x[1])    k(x[1], x[2])  ...  k(x[1], x[N]) |
| k(x[2], x[1])    k(x[2], x[2])  ...  k(x[2], x[N]) |
|      ...              ...                 ...      |
| k(x[N], x[1])    k(x[N], x[2])  ...  k(x[N], x[N]) |

For this to be a valid covariance matrix, it must be symmetric and positive definite; hence the requirement that k be a positive definite function (which, by definition, says that the above procedure will yield PD matrices). Note also we use a parameterization as suggested in Shat et al. (2014), which requires df to be greater than 2. This allows for the covariance for any finite dimensional marginal of the TP (a multivariate Student's T distribution) to just be the PD matrix generated by the kernel.

Mathematical Details

The probability density function (pdf) is a multivariate Student's T whose parameters are derived from the TP's properties:

pdf(x; df, index_points, mean_fn, kernel) = MultivariateStudentT(df, loc, K)
K = (df - 2) / df  * (kernel.matrix(index_points, index_points) + jitter * eye(N))
loc = (x - mean_fn(index_points))^T @ K @ (x - mean_fn(index_points))

where:

df is the degrees of freedom parameter for the TP.
index_points are points in the index set over which the TP is defined,
mean_fn is a callable mapping the index set to the TP's mean values,
kernel is PositiveSemidefiniteKernel-like and represents the covariance function of the TP,
jitter is added to the diagonal to ensure positive definiteness up to machine precision (otherwise Cholesky-decomposition is prone to failure),
eye(N) is an N-by-N identity matrix.

References

Amar Shah, Andrew Gordon Wilson, and Zoubin Ghahramani. Student-t Processes as Alternatives to Gaussian Processes. In Artificial Intelligence and Statistics, 2014.

Other distributions: tfd_autoregressive(), tfd_batch_reshape(), tfd_bates(), tfd_bernoulli(), tfd_beta_binomial(), tfd_beta(), tfd_binomial(), tfd_categorical(), tfd_cauchy(), tfd_chi2(), tfd_chi(), tfd_cholesky_lkj(), tfd_continuous_bernoulli(), tfd_deterministic(), tfd_dirichlet_multinomial(), tfd_dirichlet(), tfd_empirical(), tfd_exp_gamma(), tfd_exp_inverse_gamma(), tfd_exponential(), tfd_gamma_gamma(), tfd_gamma(), tfd_gaussian_process_regression_model(), tfd_gaussian_process(), tfd_generalized_normal(), tfd_geometric(), tfd_gumbel(), tfd_half_cauchy(), tfd_half_normal(), tfd_hidden_markov_model(), tfd_horseshoe(), tfd_independent(), tfd_inverse_gamma(), tfd_inverse_gaussian(), tfd_johnson_s_u(), tfd_joint_distribution_named_auto_batched(), tfd_joint_distribution_named(), tfd_joint_distribution_sequential_auto_batched(), tfd_joint_distribution_sequential(), tfd_kumaraswamy(), tfd_laplace(), tfd_linear_gaussian_state_space_model(), tfd_lkj(), tfd_log_logistic(), tfd_log_normal(), tfd_logistic(), tfd_mixture_same_family(), tfd_mixture(), tfd_multinomial(), tfd_multivariate_normal_diag_plus_low_rank(), tfd_multivariate_normal_diag(), tfd_multivariate_normal_full_covariance(), tfd_multivariate_normal_linear_operator(), tfd_multivariate_normal_tri_l(), tfd_multivariate_student_t_linear_operator(), tfd_negative_binomial(), tfd_normal(), tfd_one_hot_categorical(), tfd_pareto(), tfd_pixel_cnn(), tfd_poisson_log_normal_quadrature_compound(), tfd_poisson(), tfd_power_spherical(), tfd_probit_bernoulli(), tfd_quantized(), tfd_relaxed_bernoulli(), tfd_relaxed_one_hot_categorical(), tfd_sample_distribution(), tfd_sinh_arcsinh(), tfd_skellam(), tfd_spherical_uniform(), tfd_student_t(), tfd_transformed_distribution(), tfd_triangular(), tfd_truncated_cauchy(), tfd_truncated_normal(), tfd_uniform(), tfd_variational_gaussian_process(), tfd_vector_diffeomixture(), tfd_vector_exponential_diag(), tfd_vector_exponential_linear_operator(), tfd_vector_laplace_diag(), tfd_vector_laplace_linear_operator(), tfd_vector_sinh_arcsinh_diag(), tfd_von_mises_fisher(), tfd_von_mises(), tfd_weibull(), tfd_wishart_linear_operator(), tfd_wishart_tri_l(), tfd_wishart(), tfd_zipf()

Marginal distribution of a Student's T process at finitely many points

Arguments

Value

Details

References

See also