Generating Discrete random variables with specified weights using SciPy or NumPy Generating Discrete random variables with specified weights using SciPy or NumPy numpy numpy

Generating Discrete random variables with specified weights using SciPy or NumPy


Drawing from a discrete distribution is directly built into numpy.The function is called random.choice (difficult to find without any reference to discrete distributions in the numpy docs).

elements = [1.1, 2.2, 3.3]probabilities = [0.2, 0.5, 0.3]np.random.choice(elements, 10, p=probabilities)


Here is a short, relatively simple function that returns weighted values, it uses NumPy's digitize, accumulate, and random_sample.

import numpy as npfrom numpy.random import random_sampledef weighted_values(values, probabilities, size):    bins = np.add.accumulate(probabilities)    return values[np.digitize(random_sample(size), bins)]values = np.array([1.1, 2.2, 3.3])probabilities = np.array([0.2, 0.5, 0.3])print weighted_values(values, probabilities, 10)#Sample output:[ 2.2  2.2  1.1  2.2  2.2  3.3  3.3  2.2  3.3  3.3]

It works like this:

  1. First using accumulate we create bins.
  2. Then we create a bunch of random numbers (between 0, and 1) using random_sample
  3. We use digitize to see which bins these numbers fall into.
  4. And return the corresponding values.


You were going in a good direction: the built-in scipy.stats.rv_discrete() quite directly creates a discrete random variable. Here is how it works:

>>> from scipy.stats import rv_discrete  >>> values = numpy.array([1.1, 2.2, 3.3])>>> probabilities = [0.2, 0.5, 0.3]>>> distrib = rv_discrete(values=(range(len(values)), probabilities))  # This defines a Scipy probability distribution>>> distrib.rvs(size=10)  # 10 samples from range(len(values))array([1, 2, 0, 2, 2, 0, 2, 1, 0, 2])>>> values[_]  # Conversion to specific discrete values (the fact that values is a NumPy array is used for the indexing)[2.2, 3.3, 1.1, 3.3, 3.3, 1.1, 3.3, 2.2, 1.1, 3.3]

The distribution distrib above thus returns indexes from the values list.

More generally, rv_discrete() takes a sequence of integer values in the first elements of its values=(…,…) argument, and returns these values, in this case; there is no need to convert to specific (float) values. Here is an example:

>>> values = [10, 20, 30]>>> probabilities = [0.2, 0.5, 0.3]>>> distrib = rv_discrete(values=(values, probabilities))>>> distrib.rvs(size=10)array([20, 20, 20, 20, 20, 20, 20, 30, 20, 20])

where (integer) input values are directly returned with the desired probability.