This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the `kernel`

and/or the `bias`

are drawn
from distributions.

layer_dense_flipout( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), seed = NULL, ... )

object | Model or layer object |
---|---|

units | integer dimensionality of the output space |

activation | Activation function. Set it to None to maintain a linear activation. |

activity_regularizer | Regularizer function for the output. |

trainable | Whether the layer weights will be updated during training. |

kernel_posterior_fn | Function which creates |

kernel_posterior_tensor_fn | Function which takes a |

kernel_prior_fn | Function which creates |

kernel_divergence_fn | Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |

bias_posterior_fn | Function which creates a |

bias_posterior_tensor_fn | Function which takes a |

bias_prior_fn | Function which creates |

bias_divergence_fn | Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |

seed | scalar |

... | Additional keyword arguments passed to the |

a Keras layer

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

kernel, bias ~ posterior outputs = activation(matmul(inputs, kernel) + bias)

It uses the Flipout estimator (Wen et al., 2018), which performs a Monte
Carlo approximation of the distribution integrating over the `kernel`

and
`bias`

. Flipout uses roughly twice as many floating point operations as the
reparameterization estimator but has the advantage of significantly lower
variance.

The arguments permit separate specification of the surrogate posterior
(`q(W|x)`

), prior (`p(W)`

), and divergence for both the `kernel`

and `bias`

distributions.

Upon being built, this layer adds losses (accessible via the `losses`

property) representing the divergences of `kernel`

and/or `bias`

surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if `kl`

is the sum of `losses`

for each element of the batch,
you should pass `kl / num_examples_per_epoch`

to your optimizer).

Other layers:
`layer_autoregressive()`

,
`layer_conv_1d_flipout()`

,
`layer_conv_1d_reparameterization()`

,
`layer_conv_2d_flipout()`

,
`layer_conv_2d_reparameterization()`

,
`layer_conv_3d_flipout()`

,
`layer_conv_3d_reparameterization()`

,
`layer_dense_local_reparameterization()`

,
`layer_dense_reparameterization()`

,
`layer_dense_variational()`

,
`layer_variable()`