sampling multinomial from small log probability vectors in numpy/scipy sampling multinomial from small log probability vectors in numpy/scipy python python

sampling multinomial from small log probability vectors in numpy/scipy


First of all, I believe the problem you're encountering is because you're normalizing your probabilities incorrectly. This line is incorrect:

a = np.exp(l) / scipy.misc.logsumexp(l)

You're dividing a probability by a log probability, which makes no sense. Instead you probably want

a = np.exp(l - scipy.misc.logsumexp(l))

If you do that, you find a = [1, 0] and your multinomial sampler works as expected up to floating point precision in the second probability.


A Solution for Small N: Histograms

That said, if you still need more precision and performance is not as much of a concern, one way you could make progress is by implementing a multinomial sampler from scratch, and then modifying this to work at higher precision.

NumPy's multinomial function is implemented in Cython, and essentially performs a loop over a number of binomial samples and combines them into a multinomial sample.and you can call it like this:

np.random.multinomial(10, [0.1, 0.2, 0.7])# [0, 1, 9]

(Note that the precise output values here & below are random, and will change from call to call).

Another way you might implement a multinomial sampler is to generate N uniform random values, then compute the histogram with bins defined by the cumulative probabilities:

def multinomial(N, p):    rand = np.random.uniform(size=N)    p_cuml = np.cumsum(np.hstack([[0], p]))    p_cuml /= p_cuml[-1]    return np.histogram(rand, bins=p_cuml)[0]multinomial(10, [0.1, 0.2, 0.7])# [1, 1, 8]

With this method in mind, we can think about doing things to higher precision by keeping everything in log-space. The main trick is to realize that the log of uniform random deviates is equivalent to the negative of exponential random deviates, and so you can do everything above without ever leaving log space:

def multinomial_log(N, logp):    log_rand = -np.random.exponential(size=N)    logp_cuml = np.logaddexp.accumulate(np.hstack([[-np.inf], logp]))    logp_cuml -= logp_cuml[-1]    return np.histogram(log_rand, bins=logp_cuml)[0]multinomial_log(10, np.log([0.1, 0.2, 0.7]))# [1, 2, 7]

The resulting multinomial draws will maintain precision even for very small values in the p array.Unfortunately, these histogram-based solutions will be much slower than the native numpy.multinomial function, so if performance is an issue you may need another approach. One option would be to adapt the Cython code linked above to work in log-space, using similar mathematical tricks as I used here.


A Solution for Large N: Poisson Approximation

The problem with the above solution is that as N grows large, it becomes very slow.I was thinking about this and realized there's a more efficient way forward, despite np.random.multinomial failing for probabilities smaller than 1E-16 or so.

Here's an example of that failure: on a 64-bit machine, this will always give zero for the first entry because of the way the code is implemented, when in reality it should give something near 10:

np.random.multinomial(1E18, [1E-17, 1])# array([                  0, 1000000000000000000])

If you dig into the source, you can trace this issue to the binomial function upon which the multinomial function is built. The cython code internally does something like this:

def multinomial_basic(N, p, size=None):    results = np.array([np.random.binomial(N, pi, size) for pi in p])    results[-1] = int(N) - results[:-1].sum(0)    return np.rollaxis(results, 0, results.ndim)multinomial_basic(1E18, [1E-17, 1])# array([                  0, 1000000000000000000])

The problem is that the binomial function chokes on very small values of p – this is because the algorithm computes the value (1 - p), so the value of p is limited by floating-point precision.

So what can we do? Well, it turns out that for small values of p, the Poisson distribution is an extremely good approximation of the binomial distribution, and the implementation doesn't have these issues. So we can build a robust multinomial function based on a robust binomial sampler that switches to a Poisson sampler at small p:

def binomial_robust(N, p, size=None):    if p < 1E-7:        return np.random.poisson(N * p, size)    else:        return np.random.binomial(N, p, size)def multinomial_robust(N, p, size=None):    results = np.array([binomial_robust(N, pi, size) for pi in p])    results[-1] = int(N) - results[:-1].sum(0)    return np.rollaxis(results, 0, results.ndim)multinomial_robust(1E18, [1E-17, 1])array([                 12, 999999999999999988])

The first entry is nonzero and near 10 as expected! Note that we can't use N larger than 1E18, because it will overflow the long integer.But we can confirm that our approach works for smaller probabilities using the size parameter, and averaging over results:

p = [1E-23, 1E-22, 1E-21, 1E-20, 1]size = int(1E6)multinomial_robust(1E18, p, size).mean(0)# array([  1.70000000e-05,   9.00000000e-05,   9.76000000e-04,#          1.00620000e-02,   1.00000000e+18])

We see that even for these very small probabilities, the multinomial values are turning up in the right proportion. The result is a very robust and very fast approximation to the multinomial distribution for small p.