A Gaussian process (GP) is an indexed collection of random variables, any finite collection of which are jointly Gaussian. While this definition applies to finite index sets, it is typically implicit that the index set is infinite; in applications, it is often some finite dimensional real or complex vector space. In such cases, the GP may be thought of as a distribution over (real- or complex-valued) functions defined over the index set.

tfd_gaussian_process(
  kernel,
  index_points,
  mean_fn = NULL,
  observation_noise_variance = 0,
  jitter = 1e-06,
  validate_args = FALSE,
  allow_nan_stats = FALSE,
  name = "GaussianProcess"
)

Arguments

kernel

PositiveSemidefiniteKernel-like instance representing the GP's covariance function.

index_points

float Tensor representing finite (batch of) vector(s) of points in the index set over which the GP is defined. Shape has the form [b1, ..., bB, e1, f1, ..., fF] where F is the number of feature dimensions and must equal kernel$feature_ndims and e1 is the number (size) of index points in each batch (we denote it e1 to distinguish it from the numer of inducing index points, denoted e2 below). Ultimately the GaussianProcess distribution corresponds to an e1-dimensional multivariate normal. The batch shape must be broadcastable with kernel$batch_shape, the batch shape of inducing_index_points, and any batch dims yielded by mean_fn.

mean_fn

function that acts on index points to produce a (batch of) vector(s) of mean values at those index points. Takes a Tensor of shape [b1, ..., bB, f1, ..., fF] and returns a Tensor whose shape is (broadcastable with) [b1, ..., bB]. Default value: NULL implies constant zero function.

observation_noise_variance

float Tensor representing the variance of the noise in the Normal likelihood distribution of the model. May be batched, in which case the batch shape must be broadcastable with the shapes of all other batched parameters (kernel$batch_shape, index_points, etc.). Default value: 0.

jitter

float scalar Tensor added to the diagonal of the covariance matrix to ensure positive definiteness of the covariance matrix. Default value: 1e-6.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.

allow_nan_stats

Logical, default TRUE. When TRUE, statistics (e.g., mean, mode, variance) use the value NaN to indicate the result is undefined. When FALSE, an exception is raised if one or more of the statistic's batch members are undefined.

name

name prefixed to Ops created by this class.

Value

a distribution instance.

Details

Just as Gaussian distributions are fully specified by their first and second moments, a Gaussian process can be completely specified by a mean and covariance function. Let S denote the index set and K the space in which each indexed random variable takes its values (again, often R or C). The mean function is then a map m: S -> K, and the covariance function, or kernel, is a positive-definite function k: (S x S) -> K. The properties of functions drawn from a GP are entirely dictated (up to translation) by the form of the kernel function.

This Distribution represents the marginal joint distribution over function values at a given finite collection of points [x[1], ..., x[N]] from the index set S. By definition, this marginal distribution is just a multivariate normal distribution, whose mean is given by the vector [ m(x[1]), ..., m(x[N]) ] and whose covariance matrix is constructed from pairwise applications of the kernel function to the given inputs:

| k(x[1], x[1])    k(x[1], x[2])  ...  k(x[1], x[N]) |
| k(x[2], x[1])    k(x[2], x[2])  ...  k(x[2], x[N]) |
|      ...              ...                 ...      |
| k(x[N], x[1])    k(x[N], x[2])  ...  k(x[N], x[N]) |

For this to be a valid covariance matrix, it must be symmetric and positive definite; hence the requirement that k be a positive definite function (which, by definition, says that the above procedure will yield PD matrices).

We also support the inclusion of zero-mean Gaussian noise in the model, via the observation_noise_variance parameter. This augments the generative model to

f ~ GP(m, k)
(y[i] | f, x[i]) ~ Normal(f(x[i]), s)

where

  • m is the mean function

  • k is the covariance kernel function

  • f is the function drawn from the GP

  • x[i] are the index points at which the function is observed

  • y[i] are the observed values at the index points

  • s is the scale of the observation noise.

Note that this class represents an unconditional Gaussian process; it does not implement posterior inference conditional on observed function evaluations. This class is useful, for example, if one wishes to combine a GP prior with a non-conjugate likelihood using MCMC to sample from the posterior.

Mathematical Details

The probability density function (pdf) is a multivariate normal whose parameters are derived from the GP's properties:

pdf(x; index_points, mean_fn, kernel) = exp(-0.5 * y) / Z
K = (kernel.matrix(index_points, index_points) +
    (observation_noise_variance + jitter) * eye(N))
y = (x - mean_fn(index_points))^T @ K @ (x - mean_fn(index_points))
Z = (2 * pi)**(.5 * N) |det(K)|**(.5)

where:

  • index_points are points in the index set over which the GP is defined,

  • mean_fn is a callable mapping the index set to the GP's mean values,

  • kernel is PositiveSemidefiniteKernel-like and represents the covariance function of the GP,

  • observation_noise_variance represents (optional) observation noise.

  • jitter is added to the diagonal to ensure positive definiteness up to machine precision (otherwise Cholesky-decomposition is prone to failure),

  • eye(N) is an N-by-N identity matrix.

See also

For usage examples see e.g. tfd_sample(), tfd_log_prob(), tfd_mean().

Other distributions: tfd_autoregressive(), tfd_batch_reshape(), tfd_bates(), tfd_bernoulli(), tfd_beta_binomial(), tfd_beta(), tfd_binomial(), tfd_categorical(), tfd_cauchy(), tfd_chi2(), tfd_chi(), tfd_cholesky_lkj(), tfd_continuous_bernoulli(), tfd_deterministic(), tfd_dirichlet_multinomial(), tfd_dirichlet(), tfd_empirical(), tfd_exp_gamma(), tfd_exp_inverse_gamma(), tfd_exponential(), tfd_gamma_gamma(), tfd_gamma(), tfd_gaussian_process_regression_model(), tfd_generalized_normal(), tfd_geometric(), tfd_gumbel(), tfd_half_cauchy(), tfd_half_normal(), tfd_hidden_markov_model(), tfd_horseshoe(), tfd_independent(), tfd_inverse_gamma(), tfd_inverse_gaussian(), tfd_johnson_s_u(), tfd_joint_distribution_named_auto_batched(), tfd_joint_distribution_named(), tfd_joint_distribution_sequential_auto_batched(), tfd_joint_distribution_sequential(), tfd_kumaraswamy(), tfd_laplace(), tfd_linear_gaussian_state_space_model(), tfd_lkj(), tfd_log_logistic(), tfd_log_normal(), tfd_logistic(), tfd_mixture_same_family(), tfd_mixture(), tfd_multinomial(), tfd_multivariate_normal_diag_plus_low_rank(), tfd_multivariate_normal_diag(), tfd_multivariate_normal_full_covariance(), tfd_multivariate_normal_linear_operator(), tfd_multivariate_normal_tri_l(), tfd_multivariate_student_t_linear_operator(), tfd_negative_binomial(), tfd_normal(), tfd_one_hot_categorical(), tfd_pareto(), tfd_pixel_cnn(), tfd_poisson_log_normal_quadrature_compound(), tfd_poisson(), tfd_power_spherical(), tfd_probit_bernoulli(), tfd_quantized(), tfd_relaxed_bernoulli(), tfd_relaxed_one_hot_categorical(), tfd_sample_distribution(), tfd_sinh_arcsinh(), tfd_skellam(), tfd_spherical_uniform(), tfd_student_t_process(), tfd_student_t(), tfd_transformed_distribution(), tfd_triangular(), tfd_truncated_cauchy(), tfd_truncated_normal(), tfd_uniform(), tfd_variational_gaussian_process(), tfd_vector_diffeomixture(), tfd_vector_exponential_diag(), tfd_vector_exponential_linear_operator(), tfd_vector_laplace_diag(), tfd_vector_laplace_linear_operator(), tfd_vector_sinh_arcsinh_diag(), tfd_von_mises_fisher(), tfd_von_mises(), tfd_weibull(), tfd_wishart_linear_operator(), tfd_wishart_tri_l(), tfd_wishart(), tfd_zipf()