Do the individual Series contained within a DataFrame maintain their own index?

python pandas

This looks like either a bug or unintended consequence of python object identities, prior to the assignment we can see that the indices are the same:

In [175]:df = pd.DataFrame(dict(A=[1, 2, 3]))dfOut[175]:   A0  11  22  3In [176]:print(id(df.index))print(id(df['A']))print(id(df['A'].index))a = df.Aa132848496135123240132848496Out[176]:0    11    22    3Name: A, dtype: int64

Now if we modify our reference, the indices now become distinct objects and both a and df['A'] are the same:

In [177]:a.index = a.index + 1print(a)print(id(a))print(id(df.A))print()print(df)print(id(df.A.index))print(id(a.index))1    12    23    3Name: A, dtype: int64135123240135123240   A0  11  22  3135125144135125144

but now df.index is distinct from df['A'].index and a.index:

In [181]:print(id(df.index))print(id(a.index))print(id(df['A'].index))132848496135124808135124808

Personally I'd consider this an unintended consequence as it's difficult once you take the reference a to column 'A' what should the original df do once you start to mutate the reference and I bet this is even harder to catch than the usual Setting on copy warning

In order to avoid this it's best to call copy() to make a deep copy so that any mutations don't affect the orig df:

In [183]:df = pd.DataFrame(dict(A=[1, 2, 3]))a = df['A'].copy()a.index = a.index+1print(a)print(df['A'])print(df['A'].index)print(df.index)print()print(id(df['A']))print(id(a))print(id(df['A'].index))print(id(a.index))1    12    23    3Name: A, dtype: int640    11    22    3Name: A, dtype: int64RangeIndex(start=0, stop=3, step=1)RangeIndex(start=0, stop=3, step=1)135125984135165376135165544135125816

python pandas

it's the game of references(pointers), each DataFrame has its own index array, series in the DataFrame have references to the same index array

when a.index = a.index + 1 is executed the reference in the series was changed so a.index is the same as df.A.index which is different than df.index

now if you try to clear df cache, this will reset the series :

print(df.A.index)df._clear_item_cache()print(df.A.index)

by default series indexes inside the DataFrame are immutable but copying the series reference allowed a workaround to edit the index reference

CodeHunter

Do the individual Series contained within a DataFrame maintain their own index?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last