How to select columns from dataframe by regex How to select columns from dataframe by regex pandas pandas

How to select columns from dataframe by regex


You can use DataFrame.filter this way:

import pandas as pddf = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['d','t','didi'])>>   d  t  didi0  2  4     41  4  3     32  5  9     1df.filter(regex=("d.*"))>>   d  didi0  2     41  4     32  5     1

The idea is to select columns by regex


Use select:

import pandas as pddf = pd.DataFrame([[10, 14, 12, 44, 45, 78]], columns=['a', 'b', 'c', 'd1', 'd2', 'd3'])df.select(lambda col: col.startswith('d'), axis=1)

Result:

   d1  d2  d30  44  45  78

This is a nice solution if you're not comfortable with regular expressions.


On a larger dataset especially, a vectorized approach is actually MUCH FASTER (by more than two orders of magnitude) and is MUCH more readable.I'm providing a screenshot as proof. (Note: Except for the last few lines I wrote at the bottom to make my point clear with a vectorized approach, the other code was derived from the answer by @Alexander.)

enter image description here

Here's that code for reference:

import pandas as pdimport numpy as npn = 10000cols = ['{0}_{1}'.format(letters, number)         for number in range(n) for letters in ('d', 't', 'didi')]df = pd.DataFrame(np.random.randn(30000, n * 3), columns=cols)%timeit df[[c for c in df if c[0] == 'd']]%timeit df[[c for c in df if c.startswith('d')]]%timeit df.select(lambda col: col.startswith('d'), axis=1)%timeit df.filter(regex=("d.*"))%timeit df.filter(like='d')%timeit df.filter(like='d', axis=1)%timeit df.filter(regex=("d.*"), axis=1)%timeit df.columns.map(lambda x: x.startswith("d"))columnVals = df.columns.map(lambda x: x.startswith("d"))%timeit df.filter(columnVals, axis=1)