Pandas split DataFrame by column value Pandas split DataFrame by column value pandas pandas

Pandas split DataFrame by column value


You can use boolean indexing:

df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})print (df)   A  Sales0  3     101  4     202  7     303  6     404  1     50s = 30df1 = df[df['Sales'] >= s]print (df1)   A  Sales2  7     303  6     404  1     50df2 = df[df['Sales'] < s]print (df2)   A  Sales0  3     101  4     20

It's also possible to invert mask by ~:

mask = df['Sales'] >= sdf1 = df[mask]df2 = df[~mask]print (df1)   A  Sales2  7     303  6     404  1     50print (df2)   A  Sales0  3     101  4     20

print (mask)0    False1    False2     True3     True4     TrueName: Sales, dtype: boolprint (~mask)0     True1     True2    False3    False4    FalseName: Sales, dtype: bool


Using groupby you could split into two dataframes like

In [1047]: df1, df2 = [x for _, x in df.groupby(df['Sales'] < 30)]In [1048]: df1Out[1048]:   A  Sales2  7     303  6     404  1     50In [1049]: df2Out[1049]:   A  Sales0  3     101  4     20


Using "groupby" and list comprehension:

Storing all the split dataframe in list variable and accessing each of the seprated dataframe by their index.

DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1"],'pos':[10,20,30,40,50],})ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)]

accessing the separated DF like this:

ans[0]ans[1]ans[len(ans)-1] # this is the last separated DF

accessing the column value of the separated DF like this:

ansI_chr=ans[i].chr