Selecting columns by list (and columns are subset of list) Selecting columns by list (and columns are subset of list) python-3.x python-3.x

Selecting columns by list (and columns are subset of list)


I think you need Index.intersection:

df = pd.DataFrame({'A':[1,2,3],                   'B':[4,5,6],                   'C':[7,8,9],                   'D':[1,3,5],                   'E':[5,3,6],                   'F':[7,4,3]})print (df)   A  B  C  D  E  F0  1  4  7  1  5  71  2  5  8  3  3  42  3  6  9  5  6  3lst = ['A','R','B']print (df.columns.intersection(lst))Index(['A', 'B'], dtype='object')data = df[df.columns.intersection(lst)]print (data)   A  B0  1  41  2  52  3  6

Another solution with numpy.intersect1d:

data = df[np.intersect1d(df.columns, lst)]print (data)   A  B0  1  41  2  52  3  6


Few other ways, and list comprehension is much faster

In [1357]: df[df.columns & lst]Out[1357]:   A  B0  1  41  2  52  3  6In [1358]: df[[c for c in df.columns if c in lst]]Out[1358]:   A  B0  1  41  2  52  3  6

Timings

In [1360]: %timeit [c for c in df.columns if c in lst]100000 loops, best of 3: 2.54 µs per loopIn [1359]: %timeit df.columns & lst1000 loops, best of 3: 231 µs per loopIn [1362]: %timeit df.columns.intersection(lst)1000 loops, best of 3: 236 µs per loopIn [1363]: %timeit np.intersect1d(df.columns, lst)10000 loops, best of 3: 26.6 µs per loop

Details

In [1365]: dfOut[1365]:   A  B  C  D  E  F0  1  4  7  1  5  71  2  5  8  3  3  42  3  6  9  5  6  3In [1366]: lstOut[1366]: ['A', 'R', 'B']


Use * with list

data = df[[*lst]]

It will give the desired result.