Load data from generator into already allocated numpy array

python numpy

There's not much you can do, as stated in the comments.

Although you can consider these two solutions:

using `numpy.fromiter`

Instead of creating data = np.empty((n, k)) yourself, use numpy.fromiter and the count argument, which is made specifically from this case where you know the number of items in advance. This way numpy won't have to "guess" the size and re-allocate until the guess is large enough.Using fromiter allows to run the for loop in C instead of python. This might be a tiny bit faster, but the real bottleneck will likely be in your generators anyway.

Note that fromiter only deals with flat arrays, so you need to read everything flatten (e.g. using chain.from_iterable) and only then call reshape:

from itertools import chainn = 20k = 4generators = (   (i*j for j in range(k))   for i in range(n))flat_gen = chain.from_iterable(generators)data = numpy.fromiter(flat_gen, 'int64', count=n*k)data = data.reshape((n, k))"""array([[ 0,  0,  0,  0],       [ 0,  1,  2,  3],       [ 0,  2,  4,  6],       [ 0,  3,  6,  9],       [ 0,  4,  8, 12],       [ 0,  5, 10, 15],       [ 0,  6, 12, 18],       [ 0,  7, 14, 21],       [ 0,  8, 16, 24],       [ 0,  9, 18, 27],       [ 0, 10, 20, 30],       [ 0, 11, 22, 33],       [ 0, 12, 24, 36],       [ 0, 13, 26, 39],       [ 0, 14, 28, 42],       [ 0, 15, 30, 45],       [ 0, 16, 32, 48],       [ 0, 17, 34, 51],       [ 0, 18, 36, 54],       [ 0, 19, 38, 57]])"""

using cython

If you can re-use data and want to avoid re-allocation of the memory, you can't use numpy's fromiter anymore. IMHO the only way to avoid the python's for loop is to implement it in cython. Again, this is extremely likely overkill, since you still have to read the generators in python.

For reference, the C implementation of fromiter looks like that: https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/multiarray/ctors.c#L4001-L4118

python numpy

There is no faster way than the ones you described. You have to allocate each element of the numpy array, either by iterating the generator or by allocating the entire list.

python numpy

Couple of things here:

1) You can just say

for whatever in g:  do_stuff

Since g is a generator, the for loop understands how to get the data out of the generator.

2) You won't have to "copy" out of the generator necessarily (since it isn't doesn't have the entire sequence loaded in memory by design) but you will need to loop through it to fill up your numpy data structure. You might be able to squeeze out some performance (since your structures are large) with tools in numpy or itertools.

So the answer is "no" since you're using generators. If you don't need to have all of the data available at once, you can just use generators to keep the memory profile small but I don't have any context for what you are doing with the data.

CodeHunter

Load data from generator into already allocated numpy array

using `numpy.fromiter`

using cython

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Load data from generator into already allocated numpy array

using numpy.fromiter

using cython

Recent Posts

using `numpy.fromiter`