Move column by name to front of table in pandas
We can use ix
to reorder by passing a list:
In [27]:# get a list of columnscols = list(df)# move the column to head of list using index, pop and insertcols.insert(0, cols.pop(cols.index('Mid')))colsOut[27]:['Mid', 'Net', 'Upper', 'Lower', 'Zsore']In [28]:# use ix to reorderdf = df.ix[:, cols]dfOut[28]: Mid Net Upper Lower ZsoreAnswer_option More_than_once_a_day 2 0% 0.22% -0.12% 65Once_a_day 3 0% 0.32% -0.19% 45Several_times_a_week 4 2% 2.45% 1.10% 78Once_a_week 6 1% 1.63% -0.40% 65
Another method is to take a reference to the column and reinsert it at the front:
In [39]:mid = df['Mid']df.drop(labels=['Mid'], axis=1,inplace = True)df.insert(0, 'Mid', mid)dfOut[39]: Mid Net Upper Lower ZsoreAnswer_option More_than_once_a_day 2 0% 0.22% -0.12% 65Once_a_day 3 0% 0.32% -0.19% 45Several_times_a_week 4 2% 2.45% 1.10% 78Once_a_week 6 1% 1.63% -0.40% 65
You can also use loc
to achieve the same result as ix
will be deprecated in a future version of pandas from 0.20.0
onwards:
df = df.loc[:, cols]
Maybe I'm missing something, but a lot of these answers seem overly complicated. You should be able to just set the columns within a single list:
Column to the front:
df = df[ ['Mid'] + [ col for col in df.columns if col != 'Mid' ] ]
Or if instead, you want to move it to the back:
df = df[ [ col for col in df.columns if col != 'Mid' ] + ['Mid'] ]
Or if you wanted to move more than one column:
cols_to_move = ['Mid', 'Zsore']df = df[ cols_to_move + [ col for col in df.columns if col not in cols_to_move ] ]
I prefer this solution:
col = df.pop("Mid")df.insert(0, col.name, col)
It's simpler to read and faster than other suggested answers.
def move_column_inplace(df, col, pos): col = df.pop(col) df.insert(pos, col.name, col)
Performance assessment:
For this test, the currently last column is moved to the front in each repetition. In-place methods generally perform better. While citynorman's solution can be made in-place, Ed Chum's method based on .loc
and sachinnm's method based on reindex
cannot.
While other methods are generic, citynorman's solution is limited to pos=0
. I didn't observe any performance difference between df.loc[cols]
and df[cols]
, which is why I didn't include some other suggestions.
I tested with python 3.6.8 and pandas 0.24.2 on a MacBook Pro (Mid 2015).
import numpy as npimport pandas as pdn_cols = 11df = pd.DataFrame(np.random.randn(200000, n_cols), columns=range(n_cols))def move_column_inplace(df, col, pos): col = df.pop(col) df.insert(pos, col.name, col)def move_to_front_normanius_inplace(df, col): move_column_inplace(df, col, 0) return dfdef move_to_front_chum(df, col): cols = list(df) cols.insert(0, cols.pop(cols.index(col))) return df.loc[:, cols]def move_to_front_chum_inplace(df, col): col = df[col] df.drop(col.name, axis=1, inplace=True) df.insert(0, col.name, col) return dfdef move_to_front_elpastor(df, col): cols = [col] + [ c for c in df.columns if c!=col ] return df[cols] # or df.loc[cols]def move_to_front_sachinmm(df, col): cols = df.columns.tolist() cols.insert(0, cols.pop(cols.index(col))) df = df.reindex(columns=cols, copy=False) return dfdef move_to_front_citynorman_inplace(df, col): # This approach exploits that reset_index() moves the index # at the first position of the data frame. df.set_index(col, inplace=True) df.reset_index(inplace=True) return dfdef test(method, df): col = np.random.randint(0, n_cols) method(df, col)col = np.random.randint(0, n_cols)ret_mine = move_to_front_normanius_inplace(df.copy(), col)ret_chum1 = move_to_front_chum(df.copy(), col)ret_chum2 = move_to_front_chum_inplace(df.copy(), col)ret_elpas = move_to_front_elpastor(df.copy(), col)ret_sach = move_to_front_sachinmm(df.copy(), col)ret_city = move_to_front_citynorman_inplace(df.copy(), col)# Assert equivalence of solutions.assert(ret_mine.equals(ret_chum1))assert(ret_mine.equals(ret_chum2))assert(ret_mine.equals(ret_elpas))assert(ret_mine.equals(ret_sach))assert(ret_mine.equals(ret_city))
Results:
# For n_cols = 11:%timeit test(move_to_front_normanius_inplace, df)# 1.05 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%timeit test(move_to_front_citynorman_inplace, df)# 1.68 ms ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%timeit test(move_to_front_sachinmm, df)# 3.24 ms ± 96.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_chum, df)# 3.84 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_elpastor, df)# 3.85 ms ± 58.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_chum_inplace, df)# 9.67 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)# For n_cols = 31:%timeit test(move_to_front_normanius_inplace, df)# 1.26 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_citynorman_inplace, df)# 1.95 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_sachinmm, df)# 10.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_chum, df)# 11.5 ms ± 869 µs per loop (mean ± std. dev. of 7 runs, 100 loops each%timeit test(move_to_front_elpastor, df)# 11.4 ms ± 598 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit test(move_to_front_chum_inplace, df)# 31.4 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)