Comparison of np.random.choice vs np.random.shuffle for samples without replacement

To answer your questions:

shuffle seems to be the fastest implementation
It should give the same answers (in fact, it seems to be the same thing under the hood)

Let's start @SvenMarnach's answer here. This is not a dupe of that question, but the answer is useful. Unfortunately that answer doesn't bring it in line with shuffler timewise:

%timeit shuffler(50, 2)2.47 µs ± 180 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)%timeit chooser(50, 2)52.5 µs ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)    rng = np.random.default_rng()def chooser2(size, num_samples):    return rng.choice(size, num_samples, replace=False)%timeit chooser2(50, 2)15.9 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

the random.sample answer is better:

import random def sampler(size, num_samples):    return np.array(random.sample(range(size), num_samples))%timeit sampler(50, 2)4.6 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Still slower, though.

Since I'm not able to parse c code, I'll take sven's word for it that random.choice is doing shuffle-and-split in the background, so the methods should be equivalent. Why it's so much faster here is baffling to me though.

EDIT: A sample_indices based on @DaniMesejo's answer (slightly slower for num_samples = 2):

def sample_indices(pop, size, num_samples):    arr = np.random.rand(pop, size)    return np.argpartition(arr, num_samples, axis = 1)[:, :num_samples]

python arrays numpy random shuffle

See the source code for numpy.random.choice; with replace=False it creates a 50-item temporary list, shuffles that list, and takes two items from that list.

Since version 1.17, the implementation decisions for numpy.random.choice and numpy.random.shuffle, as with other numpy.random functions, can't be changed without affecting backward compatibility (see the recent RNG policy for NumPy). See also these questions:

Compare numpy.random.choice with numpy.random.Generator.choice, which is the newer way to sample items in NumPy 1.17 and later. The advantage is that numpy.random.Generator.choice is not subject to the same compatibility guarantee as numpy.random.choice or numpy.random.shuffle. If you care about performance of numpy.random.Generator you can file an issue in NumPy's GitHub repository.

python arrays numpy random shuffle

You could use an alternative solution, the idea is to generate a random array and then find the position of the min and max:

import numpy as npdef sample_indices(ran, size):    arr = np.random.rand(ran, size)    mi = np.argmin(arr, axis=1).reshape((-1, 1))    ma = np.argmax(arr, axis=1).reshape((-1, 1))    return np.hstack((mi, ma))def shuffler(size, num_samples):    items = list(range(size))    np.random.shuffle(items)    return items[:num_samples]def chooser(size, num_samples):    return np.random.choice(size, num_samples, replace=False)def sample_indices_shuffler(ran, size):    return np.array([shuffler(size, 2) for _ in range(ran)])def sample_indices_chooser(ran, size):    return np.array([chooser(size, 2) for _ in range(ran)])

Here are the timings:

%timeit sample_indices_chooser(1000, 50)17.3 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit sample_indices_shuffler(1000, 50)2.69 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit sample_indices(1000, 50)553 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

For:

res = sample_indices(10, 50)print(res)

Output

[[ 9  6] [31 42] [17 42] [24 45] [ 2 49] [27 31] [21 19] [ 7 16] [20 28] [32 36]]

CodeHunter

Comparison of np.random.choice vs np.random.shuffle for samples without replacement

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last