Memory-efficient way to generate a large numpy array containing random boolean values
One problem with using np.random.randint
is that it generates 64-bit integers, whereas numpy's np.bool
dtype uses only 8 bits to represent each boolean value. You are therefore allocating an intermediate array 8x larger than necessary.
A workaround that avoids intermediate 64-bit dtypes is to generate a string of random bytes using np.random.bytes
, which can be converted to an array of 8-bit integers using np.fromstring
. These integers can then be converted to boolean values, for example by testing whether they are less than 255 * p, where p is the desired probability of each element being True
:
import numpy as npdef random_bool(shape, p=0.5): n = np.prod(shape) x = np.fromstring(np.random.bytes(n), np.uint8, n) return (x < 255 * p).reshape(shape)
Benchmark:
In [1]: shape = 1200, int(2E6)In [2]: %timeit random_bool(shape)1 loops, best of 3: 12.7 s per loop
One important caveat is that the probability will be rounded down to the nearest multiple of 1/256 (for an exact multiple of 1/256 such as p=1/2 this should not affect accuracy).
Update:
An even faster method is to exploit the fact that you only need to generate a single random bit per 0 or 1 in your output array. You can therefore create a random array of 8-bit integers 1/8th the size of the final output, then convert it to np.bool
using np.unpackbits
:
def fast_random_bool(shape): n = np.prod(shape) nb = -(-n // 8) # ceiling division b = np.fromstring(np.random.bytes(nb), np.uint8, nb) return np.unpackbits(b)[:n].reshape(shape).view(np.bool)
For example:
In [3]: %timeit fast_random_bool(shape)1 loops, best of 3: 5.54 s per loop