Dynamically filtering a pandas dataframe
If you're trying to build a dynamic query, there are easier ways. Here's one using a list comprehension and str.join
:
query = ' & '.join(['{}>{}'.format(k, v) for k, v in limits_dic.items()])
Or, using f
-strings with python-3.6+,
query = ' & '.join([f'{k}>{v}' for k, v in limits_dic.items()])
print(query)'A>0 & C>-1 & B>2'
Pass the query string to df.query
, it's meant for this very purpose:
out = df.query(query)print(out) A B C1 2 5 22 10 3 14 3 6 2
What if my column names have whitespace, or other weird characters?
From pandas 0.25, you can wrap your column name in backticks so this works:
query = ' & '.join([f'`{k}`>{v}' for k, v in limits_dic.items()])
See this Stack Overflow post for more.
You could also use df.eval
if you want to obtain a boolean mask for your query, and then indexing becomes straightforward after that:
mask = df.eval(query)print(mask)0 False1 True2 True3 False4 Truedtype: boolout = df[mask]print(out) A B C1 2 5 22 10 3 14 3 6 2
String Data
If you need to query columns that use string data, the code above will need a slight modification.
Consider (data from this answer):
df = pd.DataFrame({'gender':list('MMMFFF'), 'height':[4,5,4,5,5,4], 'age':[70,80,90,40,2,3]})print (df) gender height age0 M 4 701 M 5 802 M 4 903 F 5 404 F 5 25 F 4 3
And a list of columns, operators, and values:
column = ['height', 'age', 'gender']equal = ['>', '>', '==']condition = [1.68, 20, 'F']
The appropriate modification here is:
query = ' & '.join(f'{i} {j} {repr(k)}' for i, j, k in zip(column, equal, condition))df.query(query) age gender height3 40 F 5
For information on the pd.eval()
family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval().
An alternative to both posted, that may or may not be more pythonic:
import pandas as pdimport operatorfrom functools import reducedf = pd.DataFrame({"A": [6, 2, 10, -5, 3], "B": [2, 5, 3, 2, 6], "C": [-5, 2, 1, 8, 2]})limits_dic = {"A": 0, "B": 2, "C": -1}# equiv to [df['A'] > 0, df['B'] > 2 ...]loc_elements = [df[key] > val for key, val in limits_dic.items()]df = df.loc[reduce(operator.and_, loc_elements)]