Difference(s) between merge() and concat() in pandas

python pandas join merge concat

A very high level difference is that merge() is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, use left_index=True and/or right_index=True), and concat() is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1).

join() is used to merge 2 dataframes on the basis of the index; instead of using merge() with the option left_index=True we can use join().

For example:

df1 = pd.DataFrame({'Key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})df1:   Key  data10   b   01   b   12   a   23   c   34   a   45   a   56   b   6df2 = pd.DataFrame({'Key': ['a', 'b', 'd'], 'data2': range(3)})df2:    Key data20   a   01   b   12   d   2#Merge# The 2 dataframes are merged on the basis of values in column "Key" as it is # a common column in 2 dataframespd.merge(df1, df2)   Key data1 data20   b    0    11   b    1    12   b    6    13   a    2    04   a    4    05   a    5    0#Concat# df2 dataframe is appended at the bottom of df1 pd.concat([df1, df2])   Key data1 data20   b   0     NaN1   b   1     NaN2   a   2     NaN3   c   3     NaN4   a   4     NaN5   a   5     NaN6   b   6     NaN0   a   Nan   01   b   Nan   12   d   Nan   2

python pandas join merge concat

At a high level:

.concat() simply stacks multiple DataFrame together eithervertically, or stitches horizontally after aligning on index
.merge() first aligns two DataFrame' selected common column(s) orindex, and then pick up the remaining columns from the aligned rows of each DataFrame.

More specifically, .concat():

Is a top-level pandas function
Combines two or more pandas DataFrame vertically or horizontally
Aligns only on the index when combining horizontally
Errors when any of the DataFrame contains a duplicate index.
Defaults to outer join with the option for inner join

And .merge():

Exists both as a top-level pandas function and a DataFrame method (as of pandas 1.0)
Combines exactly two DataFrame horizontally
Aligns the calling DataFrame's column(s) or index with the otherDataFrame's column(s) or index
Handles duplicate values on the joining columns or index byperforming a cartesian product
Defaults to inner join with options for left, outer, and right

Note that when performing pd.merge(left, right), if left has two rows containing the same values from the joining columns or index, each row will combine with right's corresponding row(s) resulting in a cartesian product. On the other hand, if .concat() is used to combine columns, we need to make sure no duplicated index exists in either DataFrame.

Practically speaking:

Consider .concat() first when combining homogeneous DataFrame, whileconsider .merge() first when combining complementary DataFrame.
If need to merge vertically, go with .concat(). If need to mergehorizontally via columns, go with .merge(), which by default merge on the columns in common.

Reference: Pandas 1.x Cookbook

python pandas join merge concat

pd.concat takes an Iterable as its argument. Hence, it cannot take DataFrames directly as its argument. Also Dimensions of the DataFrame should match along axis while concatenating.

pd.merge can take DataFrames as its argument, and is used to combine two DataFrames with same columns or index, which can't be done with pd.concat since it will show the repeated column in the DataFrame.

Whereas join can be used to join two DataFrames with different indices.

CodeHunter

Difference(s) between merge() and concat() in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last