How to deal with SettingWithCopyWarning in Pandas
SettingWithCopyWarning was created to flag potentially confusing "chained" assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]
df[df['A'] > 2]['B'] = new_val # new_val not set in df
The warning offers a suggestion to rewrite as follows:
df.loc[df['A'] > 2, 'B'] = new_val
However, this doesn't fit your usage, which is equivalent to:
df = df[df['A'] > 2]df['B'] = new_val
While it's clear that you don't care about writes making it back to the original frame (since you are overwriting the reference to it), unfortunately this pattern cannot be differentiated from the first chained assignment example. Hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you'd like to read further. You can safely disable this new warning with the following assignment.
import pandas as pdpd.options.mode.chained_assignment = None # default='warn'
- pandas User Guide: Indexing and selecting data
- Python Data Science Handbook: Data Indexing and Selection
- Real Python: SettingWithCopyWarning in Pandas: Views vs Copies
- Dataquest: SettingwithCopyWarning: How to Fix This Warning in Pandas
- Towards Data Science: Explaining the SettingWithCopyWarning in pandas
How to deal with
This post is meant for readers who,
- Would like to understand what this warning means
- Would like to understand different ways of suppressing this warning
- Would like to understand how to improve their code and follow good practices to avoid this warning in the future.
np.random.seed(0)df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))df A B C D E0 5 0 3 3 71 9 3 5 2 42 7 6 8 8 1
What is the
To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.
When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.
As mentioned by other answers, the
SettingWithCopyWarning was created to flag "chained assignment" operations. Consider
df in the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,
df[df.A > 5]['B'] 1 32 6Name: B, dtype: int64
df.loc[df.A > 5, 'B']1 32 6Name: B, dtype: int64
These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:
df.loc[df.A > 5, 'B'] = 4# becomesdf.__setitem__((df.A > 5, 'B'), 4)
With a single
__setitem__ call to
df. OTOH, consider this code:
df[df.A > 5]['B'] = 4# becomesdf.__getitem__(df.A > 5).__setitem__('B", 4)
Now, depending on whether
__getitem__ returned a view or a copy, the
__setitem__ operation may not work.
In general, you should use
loc for label-based assignment, and
iloc for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use
More can be found in the documentation.
All boolean indexing operations done with
loccan also be done with
iloc. The only difference is that
ilocexpects eitherintegers/positions for index or a numpy array of boolean values, andinteger/position indexes for the columns.
df.loc[df.A > 5, 'B'] = 4
Can be written nas
df.iloc[(df.A > 5).values, 1] = 4
df.loc[1, 'A'] = 100
Can be written as
df.iloc[1, 0] = 100
And so on.
Just tell me how to suppress the warning!
Consider a simple operation on the "A" column of
df. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.
df2 = df[['A']]df2['A'] /= 2/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value insteaddf2 A0 2.51 4.52 3.5
There are a couple ways of directly silencing this warning:
locto slice subsets:
df2 = df.loc[:, ['A']] df2['A'] /= 2 # Does not raise
Can be set to
"warn"is the default.
Nonewill suppress the warning entirely, and
"raise"will throw a
SettingWithCopyError, preventing the operation from going through.
pd.options.mode.chained_assignment = None df2['A'] /= 2
df2 = df[['A']].copy(deep=True) df2['A'] /= 2
@Peter Cotton in the comments, came up with a nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.
class ChainedAssignent: def __init__(self, chained=None): acceptable = [None, 'warn', 'raise'] assert chained in acceptable, "chained must be in " + str(acceptable) self.swcw = chained def __enter__(self): self.saved_swcw = pd.options.mode.chained_assignment pd.options.mode.chained_assignment = self.swcw return self def __exit__(self, *args): pd.options.mode.chained_assignment = self.saved_swcw
The usage is as follows:
# some code herewith ChainedAssignent(): df2['A'] /= 2# more code follows
Or, to raise the exception
with ChainedAssignent(chained='raise'): df2['A'] /= 2SettingWithCopyError: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead
The "XY Problem": What am I doing wrong?
A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.
I have a DataFrame
df A B C D E 0 5 0 3 3 7 1 9 3 5 2 4 2 7 6 8 8 1
I want to assign values in col "A" > 5 to 1000. My expected output is
A B C D E0 5 0 3 3 71 1000 3 5 2 42 1000 6 8 8 1
Wrong way to do this:
df.A[df.A > 5] = 1000 # works, because df.A returns a viewdf[df.A > 5]['A'] = 1000 # does not workdf.loc[df.A 5]['A'] = 1000 # does not work
Right way using
df.loc[df.A > 5, 'A'] = 1000
I am trying to set the value in cell (1, 'D') to 12345. My expected output is
A B C D E0 5 0 3 3 71 9 3 5 12345 42 7 6 8 8 1
I have tried different ways of accessing this cell, such as
df['D']. What is the best way to do this?
1. This question isn't specifically related to the warning, butit is good to understand how to do this particular operation correctlyso as to avoid situations where the warning could potentially arise infuture.
You can use any of the following methods to do this.
df.loc[1, 'D'] = 12345df.iloc[1, 3] = 12345df.at[1, 'D'] = 12345df.iat[1, 3] = 12345
I am trying to subset values based on some condition. I have aDataFrame
A B C D E1 9 3 5 2 42 7 6 8 8 1
I would like to assign values in "D" to 123 such that "C" == 5. Itried
df2.loc[df2.C == 5, 'D'] = 123
Which seems fine but I am still getting the
SettingWithCopyWarning! How do I fix this?
This is actually probably because of code higher up in your pipeline. Did you create
df2 from something larger, like
df2 = df[df.A > 5]
? In this case, boolean indexing will return a view, so
df2 will reference the original. What you'd need to do is assign
df2 to a copy:
df2 = df[df.A > 5].copy()# Or,# df2 = df.loc[df.A > 5, :]
I'm trying to drop column "C" in-place from
A B C D E1 9 3 5 2 42 7 6 8 8 1
df2.drop('C', axis=1, inplace=True)
SettingWithCopyWarning. Why is this happening?
This is because
df2 must have been created as a view from some other slicing operation, such as
df2 = df[df.A > 5]
The solution here is to either make a
df, or use
loc, as before.
In general the point of the
SettingWithCopyWarning is to show users (and especially new users) that they may be operating on a copy and not the original as they think. There are false positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.
Here is another option:
In : df = DataFrame(np.random.randn(5, 2), columns=list('AB'))In : dfa = df.ix[:, [1, 0]]In : dfa.is_copyOut: TrueIn : dfa['A'] /= 2/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_index,col_indexer] = value instead #!/usr/local/bin/python
You can set the
is_copy flag to
False, which will effectively turn off the check, for that object:
In : dfa.is_copy = FalseIn : dfa['A'] /= 2
If you explicitly copy then no further warning will happen:
In : dfa = df.ix[:, [1, 0]].copy()In : dfa['A'] /= 2
The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to not have the warning would be to do the selection operation via
quote_df = quote_df.reindex(columns=['STK', ...])
quote_df = quote_df.reindex(['STK', ...], axis=1) # v.0.21