Difference between df.reindex() and df.set_index() methods in pandas

python python-3.x pandas indexing reindex

You can see the difference on a simple example. Let's consider this dataframe:

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})print (df)   a  b0  1  31  2  4

Indexes are then 0 and 1

If you use set_index with the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b'], you will get 3.

Now if you want to use reindex with the same indexes 1 and 2 such as df.reindex([1,2]), you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']

What happend is that set_index has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'

df.set_index('a')   ba   1  32  4

while reindex change the indexes but keeps the values in column 'b' associated to the indexes in the original df

df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)     b1  4.02  NaN# drop('a',1) is just to not care about column a in my example

Finally, reindex change the order of indexes without changing the values of the row associated to each index, while set_index will change the indexes with the values of a column, without touching the order of the other values in the dataframe

python python-3.x pandas indexing reindex

Just to add, the undo to set_index would be reset_index method (more or less):

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})print (df)df.set_index('a', inplace=True)print(df)df.reset_index(inplace=True, drop=False)print(df)

   a  b0  1  31  2  4   ba   1  32  4   a  b0  1  31  2  4

python python-3.x pandas indexing reindex

Besides great answer from Ben. T, I would like to give one more example of how they are different when you use reindex and set_index to an index column

import pandas as pdimport numpy as nptestdf = pd.DataFrame({'a': [1, 3, 2],'b': [3, 5, 4],'c': [5, 7, 6]})print(testdf)print(testdf.set_index(np.random.permutation(testdf.index)))print(testdf.reindex(np.random.permutation(testdf.index)))

Output:

With set_index, when index column (the first column) is shuffled, the order of other columns are kept intact
With reindex, the order of rows are changed accordingly to the shuffle of index column.

   a  b  c0  1  3  51  3  5  72  2  4  6   a  b  c1  1  3  52  3  5  70  2  4  6   a  b  c2  2  4  61  3  5  70  1  3  5

CodeHunter

Difference between df.reindex() and df.set_index() methods in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last