In-place sort_values in pandas what does it exactly mean?
Here an example. df1
will hold sorted dataframe and df
will be intact
import pandas as pdfrom datetime import datetime as dtdf = pd.DataFrame(data=[22,22,3], index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)], columns=['foo'])df1 = df.sort_values(by='foo')print(df, df1)
In the case below, df
will hold sorted values
import pandas as pdfrom datetime import datetime as dtdf = pd.DataFrame(data=[22,22,3], index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)], columns=['foo'])df.sort_values(by='foo', inplace=True)print(df)
As you can read from the sort_values document, the return value of the function is a series. However, it is a new series instead of the original.
For example:
import numpy as npimport pandas as pds = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])print(s)a -0.872271b 0.294317c -0.017433d -1.375316e 0.993197dtype: float64s_sorted = s.sort_values()print(s_sorted)d -1.375316a -0.872271c -0.017433b 0.294317e 0.993197dtype: float64print(id(s_sorted))127952880print(id(s))127724792
So s
and s_sorted
are different series. But if you use inplace=True.
s.sort_values(inplace=True)print(s)d -1.375316a -0.872271c -0.017433b 0.294317e 0.993197dtype: float64print(id(s))127724792
It shows they are the same series, and no new series will return.
"inplace=True" is more like a physical sort while "inplace=False" is more like logic sort. The physical sort means that the data sets saved in the computer is sorted based on some keys; and the logic sort means the data sets saved in the computer is still saved in the original (when it was input/imported) way, and the sort is only working on the their index. A data sets have one or multiple logic index, but physical index is unique.