how do you filter pandas dataframes by multiple columns
Using &
operator, don't forget to wrap the sub-statements with ()
:
males = df[(df[Gender]=='Male') & (df[Year]==2014)]
To store your dataframes in a dict
using a for loop:
from collections import defaultdictdic={}for g in ['male', 'female']: dic[g]=defaultdict(dict) for y in [2013, 2014]: dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict
EDIT:
A demo for your getDF
:
def getDF(dic, gender, year): return dic[gender][year]print genDF(dic, 'male', 2014)
For more general boolean functions that you would like to use as a filter and that depend on more than one column, you can use:
df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]
where f is a function that is applied to every pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition you want on (x1, x2).