Aggregating lambda functions in pandas and numpy
You need to specify the column in data
whose values are to be aggregated. For example,
data = data.groupby(['type', 'status', 'name'])['value'].agg(...)
instead of
data = data.groupby(['type', 'status', 'name']).agg(...)
If you don't mention the column (e.g. 'value'
), then the keys in dict passed to agg
are taken to be the column names. The KeyError
s are Pandas' way of telling you that it can't find columns named one
, two
or test2
in the DataFrame data
.
Note: Passing a dict to groupby/agg
has been deprecated. Instead, going forward you should pass a list-of-tuples instead. Each tuple is expected to be of the form ('new_column_name', callable)
.
Here is runnable example:
import numpy as npimport pandas as pdN = 100data = pd.DataFrame({ 'type': np.random.randint(10, size=N), 'status': np.random.randint(10, size=N), 'name': np.random.randint(10, size=N), 'value': np.random.randint(10, size=N),})reading = np.random.random(10,)data = data.groupby(['type', 'status', 'name'])['value'].agg( [('one', np.mean), ('two', lambda value: 100* ((value>32).sum() / reading.mean())), ('test2', lambda value: 100* ((value > 45).sum() / value.mean()))])print(data)# one two test2# type status name # 0 1 3 3.0 0 0.0# 7 4.0 0 0.0# 9 8.0 0 0.0# 3 1 5.0 0 0.0# 6 3.0 0 0.0# ...
If this does not match your situation, then please provide runnable code that does.