why should I make a copy of a data frame in pandas

python pandas chained-assignment

This expands on Paul's answer. In Pandas, indexing a DataFrame returns a reference to the initial DataFrame. Thus, changing the subset will change the initial DataFrame. Thus, you'd want to use the copy if you want to make sure the initial DataFrame shouldn't change. Consider the following code:

df = DataFrame({'x': [1,2]})df_sub = df[0:1]df_sub.x = -1print(df)

You'll get:

x0 -11  2

In contrast, the following leaves df unchanged:

df_sub_copy = df[0:1].copy()df_sub_copy.x = -1

python pandas chained-assignment

Because if you don't make a copy then the indices can still be manipulated elsewhere even if you assign the dataFrame to a different name.

For example:

df2 = dffunc1(df2)func2(df)

func1 can modify df by modifying df2, so to avoid that:

df2 = df.copy()func1(df2)func2(df)

python pandas chained-assignment

It's necessary to mention that returning copy or view depends on kind of indexing.

The pandas documentation says:

Returning a view versus a copy
The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

CodeHunter

why should I make a copy of a data frame in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last