Do the individual Series contained within a DataFrame maintain their own index? Do the individual Series contained within a DataFrame maintain their own index? pandas pandas

Do the individual Series contained within a DataFrame maintain their own index?


This looks like either a bug or unintended consequence of python object identities, prior to the assignment we can see that the indices are the same:

In [175]:df = pd.DataFrame(dict(A=[1, 2, 3]))dfOut[175]:   A0  11  22  3In [176]:print(id(df.index))print(id(df['A']))print(id(df['A'].index))a = df.Aa132848496135123240132848496Out[176]:0    11    22    3Name: A, dtype: int64

Now if we modify our reference, the indices now become distinct objects and both a and df['A'] are the same:

In [177]:a.index = a.index + 1print(a)print(id(a))print(id(df.A))print()print(df)print(id(df.A.index))print(id(a.index))1    12    23    3Name: A, dtype: int64135123240135123240   A0  11  22  3135125144135125144

but now df.index is distinct from df['A'].index and a.index:

In [181]:print(id(df.index))print(id(a.index))print(id(df['A'].index))132848496135124808135124808

Personally I'd consider this an unintended consequence as it's difficult once you take the reference a to column 'A' what should the original df do once you start to mutate the reference and I bet this is even harder to catch than the usual Setting on copy warning

In order to avoid this it's best to call copy() to make a deep copy so that any mutations don't affect the orig df:

In [183]:df = pd.DataFrame(dict(A=[1, 2, 3]))a = df['A'].copy()a.index = a.index+1print(a)print(df['A'])print(df['A'].index)print(df.index)print()print(id(df['A']))print(id(a))print(id(df['A'].index))print(id(a.index))1    12    23    3Name: A, dtype: int640    11    22    3Name: A, dtype: int64RangeIndex(start=0, stop=3, step=1)RangeIndex(start=0, stop=3, step=1)135125984135165376135165544135125816


it's the game of references(pointers), each DataFrame has its own index array, series in the DataFrame have references to the same index array

when a.index = a.index + 1 is executed the reference in the series was changed so a.index is the same as df.A.index which is different than df.index

now if you try to clear df cache, this will reset the series :

print(df.A.index)df._clear_item_cache()print(df.A.index)

by default series indexes inside the DataFrame are immutable but copying the series reference allowed a workaround to edit the index reference