Pandas split DataFrame by column value
You can use boolean indexing
:
df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})print (df) A Sales0 3 101 4 202 7 303 6 404 1 50s = 30df1 = df[df['Sales'] >= s]print (df1) A Sales2 7 303 6 404 1 50df2 = df[df['Sales'] < s]print (df2) A Sales0 3 101 4 20
It's also possible to invert mask
by ~
:
mask = df['Sales'] >= sdf1 = df[mask]df2 = df[~mask]print (df1) A Sales2 7 303 6 404 1 50print (df2) A Sales0 3 101 4 20
print (mask)0 False1 False2 True3 True4 TrueName: Sales, dtype: boolprint (~mask)0 True1 True2 False3 False4 FalseName: Sales, dtype: bool
Using "groupby" and list comprehension:
Storing all the split dataframe in list variable and accessing each of the seprated dataframe by their index.
DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1"],'pos':[10,20,30,40,50],})ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)]
accessing the separated DF like this:
ans[0]ans[1]ans[len(ans)-1] # this is the last separated DF
accessing the column value of the separated DF like this:
ansI_chr=ans[i].chr