Querying a pandas dataframe column which has values as list
You cannot use pd.DataFrame.query
to test membership of a string in lists within a series of lists. Holding lists in Pandas dataframes is not recommended as you lose vectorised functionality.
With your existing dataframe, you can instead calculate a mask using pd.Series.apply
:
res = df[df['tags'].apply(lambda x: 'apple' in x)]print(res) score tags0 1 [apple, pear, guava]
Or you can use a list comprehension:
res = df[['apple' in x for x in df['tags']]]
A third option is to use set
:
res = df[df['tags'].apply(set) >= {'apple'}]
The last option, although expensive, may suit when you are testing for existence of multiple tags. In each case, we are constructing a Boolean series, which we then use to mask the dataframe.