Finding the intersection between two series in Pandas Finding the intersection between two series in Pandas python python

Finding the intersection between two series in Pandas


Place both series in Python's set container then use the set intersection method:

s1.intersection(s2)

and then transform back to list if needed.

Just noticed pandas in the tag. Can translate back to that:

pd.Series(list(set(s1).intersection(set(s2))))

From comments I have changed this to a more Pythonic expression, which is shorter and easier to read:

Series(list(set(s1) & set(s2)))

should do the trick, except if the index data is also important to you.

Have added the list(...) to translate the set before going to pd.Series as pandas does not accept a set as direct input for a Series.


Setup:

s1 = pd.Series([4,5,6,20,42])s2 = pd.Series([1,2,3,5,42])

Timings:

%%timeitpd.Series(list(set(s1).intersection(set(s2))))10000 loops, best of 3: 57.7 µs per loop%%timeitpd.Series(np.intersect1d(s1,s2))1000 loops, best of 3: 659 µs per loop%%timeitpd.Series(np.intersect1d(s1.values,s2.values))10000 loops, best of 3: 64.7 µs per loop

So the numpy solution can be comparable to the set solution even for small series, if one uses the values explicitely.


If you are using Pandas, I assume you are also using NumPy. Numpy has a function intersect1d that will work with a Pandas series.

Example:

pd.Series(np.intersect1d(pd.Series([1,2,3,5,42]), pd.Series([4,5,6,20,42])))

will return a Series with the values 5 and 42.