Python: Random list of numbers in a range keeping with a minimum distance Python: Random list of numbers in a range keeping with a minimum distance python-3.x python-3.x

Python: Random list of numbers in a range keeping with a minimum distance


One-line solution for the sorted case

Here's a simple one-liner, that generates all possibilities with equal likelihood:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

A few sample outputs:

[2, 16, 26, 38][0, 10, 25, 35][2, 12, 25, 36][0, 13, 26, 39][1, 14, 24, 34][1, 11, 29, 39][0, 13, 26, 39][1, 12, 27, 38]

Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).

Explanation: if [a, b, c, d] is an ordered list satisfying your requirements, then [a, b-9, c-18, d-27] is an ordered sample of length 4 from range(13), and vice versa. So all you need to do is generate samples from range(13), sort them, and then re-add the necessary multiples of 9 to get values that are at least 10 apart.

General unsorted solution

Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.

import randomdef ranks(sample):    """    Return the ranks of each element in an integer sample.    """    indices = sorted(range(len(sample)), key=lambda i: sample[i])    return sorted(indices, key=lambda i: indices[i])def sample_with_minimum_distance(n=40, k=4, d=10):    """    Sample of k elements from range(n), with a minimum distance d.    """    sample = random.sample(range(n-(k-1)*(d-1)), k)    return [s + (d-1)*r for s, r in zip(sample, ranks(sample))]

And some sample outputs:

>>> sample_with_minimum_distance()[17, 27, 3, 38]>>> sample_with_minimum_distance()[27, 38, 10, 0]>>> sample_with_minimum_distance()[36, 13, 1, 24]>>> sample_with_minimum_distance()[1, 25, 15, 39]>>> sample_with_minimum_distance()[26, 12, 1, 38]

The "cheap trick" solution

If the various constants here in the original problem are fixed (population range(40), samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only 715 possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list using random.choice.

For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:

>>> import itertools>>> all_samples = [  # inefficient brute-force solution...     sample for sample in itertools.product(range(40), repeat=4)...     if all(x - y >= 10 for x, y in zip(sample[1:], sample))... ]>>> len(all_samples)715

This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.

>>> all_samples = [...     [9*i + s for i, s in enumerate(sample)]...     for sample in itertools.combinations(range(13), 4)... ]>>> len(all_samples)715

Either way, we generate the list of samples just once, and then use random.choice to pick one every time we need it:

>>> random.choice(all_samples)(1, 11, 21, 38)>>> random.choice(all_samples)(0, 10, 23, 33)

Of course, this solution doesn't scale well: for 7 samples out of range(100) with a minimum distance of 5, there are over 2 billion possible different sorted samples.

Demonstration of uniformity

I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.

First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a tuple instead of a list, because for the next step we want something hashable.

>>> def sorted_sample():...     return tuple(9*i + x for i, x in...                  enumerate(sorted(random.sample(range(13), 4))))

Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:

>>> from collections import Counter>>> samples = Counter(sorted_sample() for _ in range(10**7))

A couple of quick checks:

>>> len(samples)715>>> 10**7 / 71513986.013986013986>>> samples[0, 10, 20, 30]14329>>> samples[0, 11, 22, 33]13995>>> min(samples.values())13624>>> max(samples.values())14329

We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately 10**7 / 715 times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.

Is that random variation within acceptable limits? To find out, we can do a chi-squared test with p = 0.01. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.

SciPy makes a chi-squared test for uniformity easy:

>>> from scipy.stats import chisquare>>> chisquare(list(samples.values()))Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.


Once you've generated a number, it removes a swath out of your range, since your know that no number can be within +/- 10 of the original.

A naive way to implement this would be to make a list of remaining numbers, and cut chunks out of it every time you pick a number:

domain = list(range(40))result = []while domain:    n = random.choice(domain)    result.append(n)    domain = [x for x in domain if x <= n - 10 or x >= x + 10]

Keep in mind that every sample removes up to 19 elements from your domain. That means that you are by no means guaranteed to get 4 elements in the result, but at least 3 are guaranteed.


If the sample size remains proportional to the length of your domain, then one option is to shuffle the domain and pick the first four elements which satisfy the requirement.

Using a set to keep track of which numbers are excluded allows the process to be efficient.

Code

import randomdef choose_with_step(domain, step, k):    domain = list(domain)    random.shuffle(domain)    exclusions = set()    choices = []    while domain and k > 0:        choice = domain.pop()        if choice not in exclusions:            choices.append(choice)            for x in range(choice - step + 1, choice + step):                exclusions.add(x)            k -= 1    return choices

Example of output

# choose_with_step(range(40), 10, 4)[15, 5, 33][11, 25, 35, 0][27, 12, 37, 0][36, 9, 26]

Time-complexity

Since random.shuffle runs in O(n) and the algorithm traverses the shuffled list once, the algorithm is O(n * step).

The algorithm being linear with regard to the domain length is the reason for the requirement for the sample size to be proportional to the domain size, otherwise the list might be shuffled for picking only a few elements.