R/bijectors.R
tfb_masked_autoregressive_default_template.RdThis will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in Germain et al. (2015)) and log_scale ("alpha" in Germain et al. (2015)) from the MADE network.
tfb_masked_autoregressive_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, log_scale_min_clip = -5, log_scale_max_clip = 3, log_scale_clip_gradient = FALSE, name = NULL, ... )
| hidden_layers | list-like of non-negative integer, scalars indicating the number
of units in each hidden layer. Default: |
|---|---|
| shift_only | logical indicating if only the shift term shall be computed. Default: FALSE. |
| activation | Activation function (callable). Explicitly setting to NULL implies a linear activation. |
| log_scale_min_clip | float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5. |
| log_scale_max_clip | float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3. |
| log_scale_clip_gradient | logical indicating that the gradient of tf$clip_by_value should be preserved. Default: FALSE. |
| name | A name for ops managed by this function. Default: "tfb_masked_autoregressive_default_template". |
| ... |
|
list of:
shift: Float-like Tensor of shift terms
log_scale: Float-like Tensor of log(scale) terms
Warning: This function uses masked_dense to create randomly initialized
tf$Variables. It is presumed that these will be fit, just as you would any
other neural architecture which uses tf$layers$dense.
About Hidden Layers
Each element of hidden_layers should be greater than the input_depth
(i.e., input_depth = tf$shape(input)[-1] where input is the input to the
neural network). This is necessary to ensure the autoregressivity property.
About Clipping
This function also optionally clips the log_scale (but possibly not its
gradient). This is useful because if log_scale is too small/large it might
underflow/overflow making it impossible for the MaskedAutoregressiveFlow
bijector to implement a bijection. Additionally, the log_scale_clip_gradient
bool indicates whether the gradient should also be clipped. The default does
not clip the gradient; this is useful because it still provides gradient
information (for fitting) yet solves the numerical stability problem. I.e.,
log_scale_clip_gradient = FALSE means grad[exp(clip(x))] = grad[x] exp(clip(x))
rather than the usual grad[clip(x)] exp(clip(x)).
For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().
Other bijectors:
tfb_absolute_value(),
tfb_affine_linear_operator(),
tfb_affine_scalar(),
tfb_affine(),
tfb_ascending(),
tfb_batch_normalization(),
tfb_blockwise(),
tfb_chain(),
tfb_cholesky_outer_product(),
tfb_cholesky_to_inv_cholesky(),
tfb_correlation_cholesky(),
tfb_cumsum(),
tfb_discrete_cosine_transform(),
tfb_expm1(),
tfb_exp(),
tfb_ffjord(),
tfb_fill_scale_tri_l(),
tfb_fill_triangular(),
tfb_glow(),
tfb_gompertz_cdf(),
tfb_gumbel_cdf(),
tfb_gumbel(),
tfb_identity(),
tfb_inline(),
tfb_invert(),
tfb_iterated_sigmoid_centered(),
tfb_kumaraswamy_cdf(),
tfb_kumaraswamy(),
tfb_lambert_w_tail(),
tfb_masked_autoregressive_flow(),
tfb_masked_dense(),
tfb_matrix_inverse_tri_l(),
tfb_matvec_lu(),
tfb_normal_cdf(),
tfb_ordered(),
tfb_pad(),
tfb_permute(),
tfb_power_transform(),
tfb_rational_quadratic_spline(),
tfb_rayleigh_cdf(),
tfb_real_nvp_default_template(),
tfb_real_nvp(),
tfb_reciprocal(),
tfb_reshape(),
tfb_scale_matvec_diag(),
tfb_scale_matvec_linear_operator(),
tfb_scale_matvec_lu(),
tfb_scale_matvec_tri_l(),
tfb_scale_tri_l(),
tfb_scale(),
tfb_shifted_gompertz_cdf(),
tfb_shift(),
tfb_sigmoid(),
tfb_sinh_arcsinh(),
tfb_sinh(),
tfb_softmax_centered(),
tfb_softplus(),
tfb_softsign(),
tfb_split(),
tfb_square(),
tfb_tanh(),
tfb_transform_diagonal(),
tfb_transpose(),
tfb_weibull_cdf(),
tfb_weibull()