R/bijectors.R
tfb_masked_autoregressive_default_template.Rd
This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in Germain et al. (2015)) and log_scale ("alpha" in Germain et al. (2015)) from the MADE network.
tfb_masked_autoregressive_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, log_scale_min_clip = -5, log_scale_max_clip = 3, log_scale_clip_gradient = FALSE, name = NULL, ... )
hidden_layers | list-like of non-negative integer, scalars indicating the number
of units in each hidden layer. Default: |
---|---|
shift_only | logical indicating if only the shift term shall be computed. Default: FALSE. |
activation | Activation function (callable). Explicitly setting to NULL implies a linear activation. |
log_scale_min_clip | float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5. |
log_scale_max_clip | float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3. |
log_scale_clip_gradient | logical indicating that the gradient of tf$clip_by_value should be preserved. Default: FALSE. |
name | A name for ops managed by this function. Default: "tfb_masked_autoregressive_default_template". |
... |
|
list of:
shift: Float
-like Tensor
of shift terms
log_scale: Float
-like Tensor
of log(scale) terms
Warning: This function uses masked_dense to create randomly initialized
tf$Variables
. It is presumed that these will be fit, just as you would any
other neural architecture which uses tf$layers$dense
.
About Hidden Layers
Each element of hidden_layers should be greater than the input_depth
(i.e., input_depth = tf$shape(input)[-1]
where input is the input to the
neural network). This is necessary to ensure the autoregressivity property.
About Clipping
This function also optionally clips the log_scale (but possibly not its
gradient). This is useful because if log_scale is too small/large it might
underflow/overflow making it impossible for the MaskedAutoregressiveFlow
bijector to implement a bijection. Additionally, the log_scale_clip_gradient
bool indicates whether the gradient should also be clipped. The default does
not clip the gradient; this is useful because it still provides gradient
information (for fitting) yet solves the numerical stability problem. I.e.,
log_scale_clip_gradient = FALSE means grad[exp(clip(x))] = grad[x] exp(clip(x))
rather than the usual grad[clip(x)] exp(clip(x))
.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()