How to vectorize dynamically sized numpy arrays in pandas How to vectorize dynamically sized numpy arrays in pandas numpy numpy

How to vectorize dynamically sized numpy arrays in pandas


try this,

df['computed']= [[a]+[b]+list(np.arange(b+1, length)) for a, b, length in zip(df.a, df.b, (df.b) + df.length-1)]

o/P:

   a  b  length                            computed0  1  6       3                           [1, 6, 7]1  2  7       5                    [2, 7, 8, 9, 10]2  3  8       7           [3, 8, 9, 10, 11, 12, 13]3  4  9       9  [4, 9, 10, 11, 12, 13, 14, 15, 16]4  5  0       3                           [5, 0, 1]


Not sure if this is what you were looking for but, if it's too slow you can always try multiprocessing:

import pandas as pdimport numpy as npfrom multiprocessing import Pooldef parallelize(df, func, n_cores=4):    df_split = np.array_split(df, n_cores)    pool = Pool(n_cores)    df = pd.concat(pool.map(func, df_split))    pool.close()    pool.join()    return dfdef func(df):    df['computed'] = df.apply(lambda x: np.array([x['a'], x['b']] + [x['b'] + i for i in range(1, x['length'] - 1)]), axis=1)    return dfdf = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0], 'length': [3, 5, 7, 9, 3]})df = parallelize(df, func)

(for small values of length it will be less efficient than your original code)