Are these functions equivalent? Are these functions equivalent? numpy numpy

Are these functions equivalent?


I would say they are, as their sampling is defined in almost the exact same way in both cases. This is how the sampling of tf.distributions.StudentT is defined:

def _sample_n(self, n, seed=None):  # The sampling method comes from the fact that if:  #   X ~ Normal(0, 1)  #   Z ~ Chi2(df)  #   Y = X / sqrt(Z / df)  # then:  #   Y ~ StudentT(df).  seed = seed_stream.SeedStream(seed, "student_t")  shape = tf.concat([[n], self.batch_shape_tensor()], 0)  normal_sample = tf.random.normal(shape, dtype=self.dtype, seed=seed())  df = self.df * tf.ones(self.batch_shape_tensor(), dtype=self.dtype)  gamma_sample = tf.random.gamma([n],                                 0.5 * df,                                 beta=0.5,                                 dtype=self.dtype,                                 seed=seed())  samples = normal_sample * tf.math.rsqrt(gamma_sample / df)  return samples * self.scale + self.loc  # Abs(scale) not wanted.

So it is a standard normal sample divided by the square root of a chi-square sample with parameter df divided by df. The chi-square sample is taken as a gamma sample with parameter 0.5 * df and rate 0.5, which is equivalent (chi-square is a special case of gamma). The scale value, like the loc, only comes into play in the last line, as a way to "relocate" the distribution sample at some point and scale. When scale is one and loc is zero, they do nothing.

Here is the implementation for np.random.standard_t:

double legacy_standard_t(aug_bitgen_t *aug_state, double df) {  double num, denom;  num = legacy_gauss(aug_state);  denom = legacy_standard_gamma(aug_state, df / 2);  return sqrt(df / 2) * num / sqrt(denom);})

So essentially the same thing, slightly rephrased. Here we have also have a gamma with shape df / 2 but it is standard (rate one). However, the missing 0.5 is now by the numerator as / 2 within the sqrt. So it's just moving the numbers around. Here there is no scale or loc, though.

In truth, the difference is that in the case of TensorFlow the distribution really is a noncentral t-distribution. A simple empirical proof that they are the same for loc=0.0 and scale=1.0 is to plot histograms for both distributions and see how close they look.

import numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltnp.random.seed(0)t_np = np.random.standard_t(df=3, size=10000)with tf.Graph().as_default(), tf.Session() as sess:    tf.random.set_random_seed(0)    t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)    t_tf = sess.run(t_dist.sample(10000))plt.hist((t_np, t_tf), np.linspace(-10, 10, 20), label=['NumPy', 'TensorFlow'])plt.legend()plt.tight_layout()plt.show()

Output:

Distribution histograms

That looks pretty close. Obviously, from the point of view of statistical samples, this is not any kind of proof. If you were not still convinced, there are some statistical tools for testing whether a sample comes from a certain distribution or two samples come from the same distribution.