Creating a mixture of probability distributions for sampling

python numpy scipy probability-density

Sampling from a mixture of distributions (where PDFs are added with some coefficients c_1, c_2, ... c_n) is equivalent to sampling each independently, and then, for each index, picking the value from k-th sample, with probability c_k.

The latter, mixing, step can be efficiently done with numpy.random.choice. Here is an example where three distributions are mixed. The distributions are listed in distributions, and their coefficients in coefficients. There is a fat normal distribution, a uniform distribution, and a narrow normal distribution, with coefficients 0.5, 0.2, 0.3. The mixing happens at data[np.arange(sample_size), random_idx] after random_idx are generated according to given coefficients.

import numpy as npimport matplotlib.pyplot as pltdistributions = [    {"type": np.random.normal, "kwargs": {"loc": -3, "scale": 2}},    {"type": np.random.uniform, "kwargs": {"low": 4, "high": 6}},    {"type": np.random.normal, "kwargs": {"loc": 2, "scale": 1}},]coefficients = np.array([0.5, 0.2, 0.3])coefficients /= coefficients.sum()      # in case these did not add up to 1sample_size = 100000num_distr = len(distributions)data = np.zeros((sample_size, num_distr))for idx, distr in enumerate(distributions):    data[:, idx] = distr["type"](size=(sample_size,), **distr["kwargs"])random_idx = np.random.choice(np.arange(num_distr), size=(sample_size,), p=coefficients)sample = data[np.arange(sample_size), random_idx]plt.hist(sample, bins=100, density=True)plt.show()

python numpy scipy probability-density

Following @PaulPanzer's pointer in the comments, I created the following subclass for easily creating mixture models from the SciPy distributions. Note, the pdf is not required for my question, but it was nice for me to have.

class MixtureModel(rv_continuous):    def __init__(self, submodels, *args, **kwargs):        super().__init__(*args, **kwargs)        self.submodels = submodels    def _pdf(self, x):        pdf = self.submodels[0].pdf(x)        for submodel in self.submodels[1:]:            pdf += submodel.pdf(x)        pdf /= len(self.submodels)        return pdf    def rvs(self, size):        submodel_choices = np.random.randint(len(self.submodels), size=size)        submodel_samples = [submodel.rvs(size=size) for submodel in self.submodels]        rvs = np.choose(submodel_choices, submodel_samples)        return rvsmixture_gaussian_model = MixtureModel([norm(-3, 1), norm(3, 1)])x_axis = np.arange(-6, 6, 0.001)mixture_pdf = mixture_gaussian_model.pdf(x_axis)mixture_rvs = mixture_gaussian_model.rvs(10)

python numpy scipy probability-density

The code below stores a 1000 samples from N(0,1) and 500 samples from N(7,2) in an array that can then be sampled from.

import numpy as npfrom scipy import statsd = np.concatenate((stats.norm.rvs(0.0, 1.0, 1000), stats.norm.rvs(7.0, 2.0, 500)))np.random.choice(d, 3)  # sample 3 observations

Mixture components other than the Normal can be used (e.g., stats.poisson) and there can be arbitrary number of those.

CodeHunter

Creating a mixture of probability distributions for sampling

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last