Find column whose name contains a specific string
Just iterate over DataFrame.columns
, now this is an example in which you will end up with a list of column names that match:
import pandas as pddata = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}df = pd.DataFrame(data)spike_cols = [col for col in df.columns if 'spike' in col]print(list(df.columns))print(spike_cols)
Output:
['hey spke', 'no', 'spike-2', 'spiked-in']['spike-2', 'spiked-in']
Explanation:
df.columns
returns a list of column names[col for col in df.columns if 'spike' in col]
iterates over the listdf.columns
with the variablecol
and adds it to the resulting list ifcol
contains'spike'
. This syntax is list comprehension.
If you only want the resulting data set with the columns that match you can do this:
df2 = df.filter(regex='spike')print(df2)
Output:
spike-2 spiked-in0 1 71 2 82 3 9
This answer uses the DataFrame.filter method to do this without list comprehension:
import pandas as pddata = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}df = pd.DataFrame(data)print(df.filter(like='spike').columns)
Will output just 'spike-2'. You can also use regex, as some people suggested in comments above:
print(df.filter(regex='spike|spke').columns)
Will output both columns: ['spike-2', 'hey spke']
You can also use df.columns[df.columns.str.contains(pat = 'spike')]
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}df = pd.DataFrame(data)colNames = df.columns[df.columns.str.contains(pat = 'spike')] print(colNames)
This will output the column names: 'spike-2', 'spiked-in'
More about pandas.Series.str.contains.