how do you filter pandas dataframes by multiple columns how do you filter pandas dataframes by multiple columns python python

how do you filter pandas dataframes by multiple columns


Using & operator, don't forget to wrap the sub-statements with ():

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

from collections import defaultdictdic={}for g in ['male', 'female']:  dic[g]=defaultdict(dict)  for y in [2013, 2014]:    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

EDIT:

A demo for your getDF:

def getDF(dic, gender, year):  return dic[gender][year]print genDF(dic, 'male', 2014)


For more general boolean functions that you would like to use as a filter and that depend on more than one column, you can use:

df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]

where f is a function that is applied to every pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition you want on (x1, x2).


Start from pandas 0.13, this is the most efficient way.

df.query('Gender=="Male" & Year=="2014" ')