How to vectorize dynamically sized numpy arrays in pandas
try this,
df['computed']= [[a]+[b]+list(np.arange(b+1, length)) for a, b, length in zip(df.a, df.b, (df.b) + df.length-1)]
o/P:
a b length computed0 1 6 3 [1, 6, 7]1 2 7 5 [2, 7, 8, 9, 10]2 3 8 7 [3, 8, 9, 10, 11, 12, 13]3 4 9 9 [4, 9, 10, 11, 12, 13, 14, 15, 16]4 5 0 3 [5, 0, 1]
Not sure if this is what you were looking for but, if it's too slow you can always try multiprocessing:
import pandas as pdimport numpy as npfrom multiprocessing import Pooldef parallelize(df, func, n_cores=4): df_split = np.array_split(df, n_cores) pool = Pool(n_cores) df = pd.concat(pool.map(func, df_split)) pool.close() pool.join() return dfdef func(df): df['computed'] = df.apply(lambda x: np.array([x['a'], x['b']] + [x['b'] + i for i in range(1, x['length'] - 1)]), axis=1) return dfdf = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0], 'length': [3, 5, 7, 9, 3]})df = parallelize(df, func)
(for small values of length
it will be less efficient than your original code)