Pandas Split Dataframe into two Dataframes at a specific row
Demo:
In [255]: df = pd.DataFrame(np.random.rand(5, 6), columns=list('abcdef'))In [256]: dfOut[256]: a b c d e f0 0.823638 0.767999 0.460358 0.034578 0.592420 0.7768031 0.344320 0.754412 0.274944 0.545039 0.031752 0.7845642 0.238826 0.610893 0.861127 0.189441 0.294646 0.5570343 0.478562 0.571750 0.116209 0.534039 0.869545 0.8555204 0.130601 0.678583 0.157052 0.899672 0.093976 0.268974In [257]: dfs = np.split(df, [4], axis=1)In [258]: dfs[0]Out[258]: a b c d0 0.823638 0.767999 0.460358 0.0345781 0.344320 0.754412 0.274944 0.5450392 0.238826 0.610893 0.861127 0.1894413 0.478562 0.571750 0.116209 0.5340394 0.130601 0.678583 0.157052 0.899672In [259]: dfs[1]Out[259]: e f0 0.592420 0.7768031 0.031752 0.7845642 0.294646 0.5570343 0.869545 0.8555204 0.093976 0.268974
np.split()
is pretty flexible - let's split an original DF into 3 DFs at columns with indexes [2,3]
:
In [260]: dfs = np.split(df, [2,3], axis=1)In [261]: dfs[0]Out[261]: a b0 0.823638 0.7679991 0.344320 0.7544122 0.238826 0.6108933 0.478562 0.5717504 0.130601 0.678583In [262]: dfs[1]Out[262]: c0 0.4603581 0.2749442 0.8611273 0.1162094 0.157052In [263]: dfs[2]Out[263]: d e f0 0.034578 0.592420 0.7768031 0.545039 0.031752 0.7845642 0.189441 0.294646 0.5570343 0.534039 0.869545 0.8555204 0.899672 0.093976 0.268974
I generally use array split because it's easier simple syntax and scales better with more than 2 partitions.
import numpy as nppartitions = 2dfs = np.array_split(df, partitions)
np.split(df, [100,200,300], axis=0]
wants explicit index numbers which may or may not be desirable.