How to replace negative numbers in Pandas Data Frame by zero
If all your columns are numeric, you can use boolean indexing:
In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})In [3]: dfOut[3]: a b0 0 -31 -1 22 2 1In [4]: df[df < 0] = 0In [5]: dfOut[5]: a b0 0 01 0 22 2 1
For the more general case, this answer shows the private method _get_numeric_data
:
In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1], 'c': ['foo', 'goo', 'bar']})In [3]: dfOut[3]: a b c0 0 -3 foo1 -1 2 goo2 2 1 barIn [4]: num = df._get_numeric_data()In [5]: num[num < 0] = 0In [6]: dfOut[6]: a b c0 0 0 foo1 0 2 goo2 2 1 bar
With timedelta
type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:
In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'), ...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})In [3]: dfOut[3]: a b0 0 days -3 days1 -1 days 2 days2 2 days 1 daysIn [4]: for k, v in df.iteritems(): ...: v[v < 0] = 0 ...: In [5]: dfOut[5]: a b0 0 days 0 days1 0 days 2 days2 2 days 1 days
Update: comparison with a pd.Timedelta
works on the whole DataFrame:
In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'), ...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})In [3]: df[df < pd.Timedelta(0)] = 0In [4]: dfOut[4]: a b0 0 days 0 days1 0 days 2 days2 2 days 1 days
Another succinct way of doing this is pandas.DataFrame.clip.
For example:
import pandas as pdIn [20]: df = pd.DataFrame({'a': [-1, 100, -2]})In [21]: dfOut[21]: a0 -11 1002 -2In [22]: df.clip(lower=0)Out[22]: a0 01 1002 0
There's also df.clip_lower(0)
.
Perhaps you could use pandas.where(args)
like so:
data_frame = data_frame.where(data_frame < 0, 0)