Convert a pandas Series of lists into a numpy array
Use Series.str.strip
+ Series.str.split
and create a new np.array
with dtype=float
:
arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')
Result:
print(arr)array([[ 1. , -2. , 0. , 1.2 , 4.34], [ 3.3 , 4. , 0. , -1. , 9.1 ]])
You can try to remove the "[]" from the Series object first, then things will become easier, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html.
ds1 = ds.str.strip("[]")# split and exapand the data, conver to numpy arrayarr = ds1.str.split(" ", expand=True).to_numpy(dtype=float)
Then arr
will be the right format you want,
array([[ 1. , -2. , 0. , 1.2 , 4.34], [ 3.3 , 4. , 0. , -1. , 9.1 ]])
Then I did a little profiling in comparison with Shubham's colution.
# Shubham's way%timeit arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')332 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)# my way%timeit ds.str.strip("[]").str.split(" ", expand=True).to_numpy(dtype=float)741 µs ± 4.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Obviously, his solution is much faster! Cheers!