This distribution models a random variable, making use of a SinhArcsinh transformation (which has adjustable tailweight and skew), a rescaling, and a shift. The SinhArcsinh transformation of the Normal is described in great depth in Sinh-arcsinh distributions. Here we use a slightly different parameterization, in terms of tailweight and skewness. Additionally we allow for distributions other than Normal, and control over scale as well as a "shift" parameter loc.

  skewness = NULL,
  tailweight = NULL,
  distribution = NULL,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = "SinhArcsinh"



Floating-point Tensor.


Tensor of same dtype as loc.


Skewness parameter. Default is 0.0 (no skew).


Tailweight parameter. Default is 1.0 (unchanged tailweight)


tf$distributions$Distribution-like instance. Distribution that is transformed to produce this distribution. Default is tfd_normal(0, 1). Must be a scalar-batch, scalar-event distribution. Typically distribution$reparameterization_type = FULLY_REPARAMETERIZED or it is a function of non-trainable parameters. WARNING: If you backprop through a SinhArcsinh sample and distribution is not FULLY_REPARAMETERIZED yet is a function of trainable variables, then the gradient will be incorrect!


Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.


Logical, default TRUE. When TRUE, statistics (e.g., mean, mode, variance) use the value NaN to indicate the result is undefined. When FALSE, an exception is raised if one or more of the statistic's batch members are undefined.


name prefixed to Ops created by this class.


a distribution instance.


Mathematical Details

Given random variable Z, we define the SinhArcsinh transformation of Z, Y, parameterized by (loc, scale, skewness, tailweight), via the relation:

Y := loc + scale * F(Z) * (2 / F_0(2))
F(Z) := Sinh( (Arcsinh(Z) + skewness) * tailweight )
F_0(Z) := Sinh( Arcsinh(Z) * tailweight )

This distribution is similar to the location-scale transformation L(Z) := loc + scale * Z in the following ways:

  • If skewness = 0 and tailweight = 1 (the defaults), F(Z) = Z, and then Y = L(Z) exactly.

  • loc is used in both to shift the result by a constant factor.

  • The multiplication of scale by 2 / F_0(2) ensures that if skewness = 0 P[Y - loc <= 2 * scale] = P[L(Z) - loc <= 2 * scale]. Thus it can be said that the weights in the tails of Y and L(Z) beyond loc + 2 * scale are the same.

This distribution is different than loc + scale * Z due to the reshaping done by F:

  • Positive (negative) skewness leads to positive (negative) skew.

  • positive skew means, the mode of F(Z) is "tilted" to the right.

  • positive skew means positive values of F(Z) become more likely, and negative values become less likely.

  • Larger (smaller) tailweight leads to fatter (thinner) tails.

  • Fatter tails mean larger values of |F(Z)| become more likely.

  • tailweight < 1 leads to a distribution that is "flat" around Y = loc, and a very steep drop-off in the tails.

  • tailweight > 1 leads to a distribution more peaked at the mode with heavier tails.

To see the argument about the tails, note that for |Z| >> 1 and |Z| >> (|skewness| * tailweight)**tailweight, we have Y approx 0.5 Z**tailweight e**(sign(Z) skewness * tailweight).

To see the argument regarding multiplying scale by 2 / F_0(2),

P[(Y - loc) / scale <= 2] = P[F(Z) * (2 / F_0(2)) <= 2]
                          = P[F(Z) <= F_0(2)]
                          = P[Z <= 2]  (if F = F_0).

See also

For usage examples see e.g. tfd_sample(), tfd_log_prob(), tfd_mean().

Other distributions: tfd_autoregressive(), tfd_batch_reshape(), tfd_bates(), tfd_bernoulli(), tfd_beta_binomial(), tfd_beta(), tfd_binomial(), tfd_categorical(), tfd_cauchy(), tfd_chi2(), tfd_chi(), tfd_cholesky_lkj(), tfd_continuous_bernoulli(), tfd_deterministic(), tfd_dirichlet_multinomial(), tfd_dirichlet(), tfd_empirical(), tfd_exp_gamma(), tfd_exp_inverse_gamma(), tfd_exponential(), tfd_gamma_gamma(), tfd_gamma(), tfd_gaussian_process_regression_model(), tfd_gaussian_process(), tfd_generalized_normal(), tfd_geometric(), tfd_gumbel(), tfd_half_cauchy(), tfd_half_normal(), tfd_hidden_markov_model(), tfd_horseshoe(), tfd_independent(), tfd_inverse_gamma(), tfd_inverse_gaussian(), tfd_johnson_s_u(), tfd_joint_distribution_named_auto_batched(), tfd_joint_distribution_named(), tfd_joint_distribution_sequential_auto_batched(), tfd_joint_distribution_sequential(), tfd_kumaraswamy(), tfd_laplace(), tfd_linear_gaussian_state_space_model(), tfd_lkj(), tfd_log_logistic(), tfd_log_normal(), tfd_logistic(), tfd_mixture_same_family(), tfd_mixture(), tfd_multinomial(), tfd_multivariate_normal_diag_plus_low_rank(), tfd_multivariate_normal_diag(), tfd_multivariate_normal_full_covariance(), tfd_multivariate_normal_linear_operator(), tfd_multivariate_normal_tri_l(), tfd_multivariate_student_t_linear_operator(), tfd_negative_binomial(), tfd_normal(), tfd_one_hot_categorical(), tfd_pareto(), tfd_pixel_cnn(), tfd_poisson_log_normal_quadrature_compound(), tfd_poisson(), tfd_power_spherical(), tfd_probit_bernoulli(), tfd_quantized(), tfd_relaxed_bernoulli(), tfd_relaxed_one_hot_categorical(), tfd_sample_distribution(), tfd_skellam(), tfd_spherical_uniform(), tfd_student_t_process(), tfd_student_t(), tfd_transformed_distribution(), tfd_triangular(), tfd_truncated_cauchy(), tfd_truncated_normal(), tfd_uniform(), tfd_variational_gaussian_process(), tfd_vector_diffeomixture(), tfd_vector_exponential_diag(), tfd_vector_exponential_linear_operator(), tfd_vector_laplace_diag(), tfd_vector_laplace_linear_operator(), tfd_vector_sinh_arcsinh_diag(), tfd_von_mises_fisher(), tfd_von_mises(), tfd_weibull(), tfd_wishart_linear_operator(), tfd_wishart_tri_l(), tfd_wishart(), tfd_zipf()