Generating Discrete random variables with specified weights using SciPy or NumPy
Drawing from a discrete distribution is directly built into numpy.The function is called random.choice (difficult to find without any reference to discrete distributions in the numpy docs).
elements = [1.1, 2.2, 3.3]probabilities = [0.2, 0.5, 0.3]np.random.choice(elements, 10, p=probabilities)
Here is a short, relatively simple function that returns weighted values, it uses NumPy's digitize
, accumulate
, and random_sample
.
import numpy as npfrom numpy.random import random_sampledef weighted_values(values, probabilities, size): bins = np.add.accumulate(probabilities) return values[np.digitize(random_sample(size), bins)]values = np.array([1.1, 2.2, 3.3])probabilities = np.array([0.2, 0.5, 0.3])print weighted_values(values, probabilities, 10)#Sample output:[ 2.2 2.2 1.1 2.2 2.2 3.3 3.3 2.2 3.3 3.3]
It works like this:
- First using
accumulate
we create bins. - Then we create a bunch of random numbers (between
0
, and1
) usingrandom_sample
- We use
digitize
to see which bins these numbers fall into. - And return the corresponding values.
You were going in a good direction: the built-in scipy.stats.rv_discrete()
quite directly creates a discrete random variable. Here is how it works:
>>> from scipy.stats import rv_discrete >>> values = numpy.array([1.1, 2.2, 3.3])>>> probabilities = [0.2, 0.5, 0.3]>>> distrib = rv_discrete(values=(range(len(values)), probabilities)) # This defines a Scipy probability distribution>>> distrib.rvs(size=10) # 10 samples from range(len(values))array([1, 2, 0, 2, 2, 0, 2, 1, 0, 2])>>> values[_] # Conversion to specific discrete values (the fact that values is a NumPy array is used for the indexing)[2.2, 3.3, 1.1, 3.3, 3.3, 1.1, 3.3, 2.2, 1.1, 3.3]
The distribution distrib
above thus returns indexes from the values
list.
More generally, rv_discrete()
takes a sequence of integer values in the first elements of its values=(…,…)
argument, and returns these values, in this case; there is no need to convert to specific (float) values. Here is an example:
>>> values = [10, 20, 30]>>> probabilities = [0.2, 0.5, 0.3]>>> distrib = rv_discrete(values=(values, probabilities))>>> distrib.rvs(size=10)array([20, 20, 20, 20, 20, 20, 20, 30, 20, 20])
where (integer) input values are directly returned with the desired probability.