How to replace negative numbers in Pandas Data Frame by zero How to replace negative numbers in Pandas Data Frame by zero python python

How to replace negative numbers in Pandas Data Frame by zero


If all your columns are numeric, you can use boolean indexing:

In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})In [3]: dfOut[3]:    a  b0  0 -31 -1  22  2  1In [4]: df[df < 0] = 0In [5]: dfOut[5]:    a  b0  0  01  0  22  2  1

For the more general case, this answer shows the private method _get_numeric_data:

In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],                           'c': ['foo', 'goo', 'bar']})In [3]: dfOut[3]:    a  b    c0  0 -3  foo1 -1  2  goo2  2  1  barIn [4]: num = df._get_numeric_data()In [5]: num[num < 0] = 0In [6]: dfOut[6]:    a  b    c0  0  0  foo1  0  2  goo2  2  1  bar

With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})In [3]: dfOut[3]:         a       b0  0 days -3 days1 -1 days  2 days2  2 days  1 daysIn [4]: for k, v in df.iteritems():   ...:     v[v < 0] = 0   ...:     In [5]: dfOut[5]:        a      b0 0 days 0 days1 0 days 2 days2 2 days 1 days

Update: comparison with a pd.Timedelta works on the whole DataFrame:

In [1]: import pandas as pdIn [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})In [3]: df[df < pd.Timedelta(0)] = 0In [4]: dfOut[4]:        a      b0 0 days 0 days1 0 days 2 days2 2 days 1 days


Another succinct way of doing this is pandas.DataFrame.clip.

For example:

import pandas as pdIn [20]: df = pd.DataFrame({'a': [-1, 100, -2]})In [21]: dfOut[21]:      a0   -11  1002   -2In [22]: df.clip(lower=0)Out[22]:      a0    01  1002    0

There's also df.clip_lower(0).


Perhaps you could use pandas.where(args) like so:

data_frame = data_frame.where(data_frame < 0, 0)