Finding common rows (intersection) in two Pandas dataframes

My understanding is that this question is better answered over in this post.

But briefly, the answer to the OP with this method is simply:

s1 = pd.merge(df1, df2, how='inner', on=['user_id'])

Which gives s1 with 5 columns: user_id and the other two columns from each of df1 and df2.

python pandas dataframe intersect

If I understand you correctly, you can use a combination of Series.isin() and DataFrame.append():

In [80]: df1Out[80]:   rating  user_id0       2  0x21abL1       1  0x21abL2       1   0xdafL3       0  0x21abL4       4  0x1d14L5       2  0x21abL6       1  0x21abL7       0   0xdafL8       4  0x1d14L9       1  0x21abLIn [81]: df2Out[81]:   rating      user_id0       2      0x1d14L1       1    0xdbdcad72       1      0x21abL3       3      0x21abL4       3      0x21abL5       1  0x5734a81e26       2      0x1d14L7       0       0xdafL8       0      0x1d14L9       4  0x5734a81e2In [82]: ind = df2.user_id.isin(df1.user_id) & df1.user_id.isin(df2.user_id)In [83]: indOut[83]:0     True1    False2     True3     True4     True5    False6     True7     True8     True9    FalseName: user_id, dtype: boolIn [84]: df1[ind].append(df2[ind])Out[84]:   rating  user_id0       2  0x21abL2       1   0xdafL3       0  0x21abL4       4  0x1d14L6       1  0x21abL7       0   0xdafL8       4  0x1d14L0       2  0x1d14L2       1  0x21abL3       3  0x21abL4       3  0x21abL6       2  0x1d14L7       0   0xdafL8       0  0x1d14L

This is essentially the algorithm you described as "clunky", using idiomatic pandas methods. Note the duplicate row indices. Also, note that this won't give you the expected output if df1 and df2 have no overlapping row indices, i.e., if

In [93]: df1.index & df2.indexOut[93]: Int64Index([], dtype='int64')

In fact, it won't give the expected output if their row indices are not equal.

python pandas dataframe intersect

In SQL, this problem could be solved by several methods:

select * from df1 where exists (select * from df2 where df2.user_id = df1.user_id)union allselect * from df2 where exists (select * from df1 where df1.user_id = df2.user_id)

or join and then unpivot (possible in SQL server)

select    df1.user_id,    c.ratingfrom df1    inner join df2 on df2.user_i = df1.user_id    outer apply (        select df1.rating union all        select df2.rating    ) as c

Second one could be written in pandas with something like:

>>> df1 = pd.DataFrame({"user_id":[1,2,3], "rating":[10, 15, 20]})>>> df2 = pd.DataFrame({"user_id":[3,4,5], "rating":[30, 35, 40]})>>>>>> df4 = df[['user_id', 'rating_1']].rename(columns={'rating_1':'rating'})>>> df = pd.merge(df1, df2, on='user_id', suffixes=['_1', '_2'])>>> df3 = df[['user_id', 'rating_1']].rename(columns={'rating_1':'rating'})>>> df4 = df[['user_id', 'rating_2']].rename(columns={'rating_2':'rating'})>>> pd.concat([df3, df4], axis=0)   user_id  rating0        3      200        3      30

CodeHunter

Finding common rows (intersection) in two Pandas dataframes

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last