Merge a copy of one pandas DataFrame into every row of another DataFrame?

python python-3.x pandas dataframe merge

It looks like you are looking for a full join / cartesian join. It can be accomplished with pd.merge if we assign the same key to all observations.

big.assign(key=1).merge(small.assign(key=1), how='outer', on='key')

Output

   a  b  key  id val0  1  4    1  aa   a1  1  4    1  bb   b2  2  5    1  aa   a3  2  5    1  bb   b4  3  6    1  aa   a5  3  6    1  bb   b

If you already have a columns called 'key', you can essentially call it anything:

big['thiswontmatchanything'] = 1small['thiswontmatchanything'] = 1big.merge(small, how='outer', on='thiswontmatchanything').drop('thiswontmatchanything', axis=1)

Output

    a   b   id  val0   1   4   aa  a1   1   4   bb  b2   2   5   aa  a3   2   5   bb  b4   3   6   aa  a5   3   6   bb  b

python python-3.x pandas dataframe merge

I believe there is a much shorter way. Given data frames df1 and df2, you could do

df = df1.merge(df2, how='cross')

df = df2.merge(df1, how='cross')

You could potentially implement a simple if-then-else to figure out which data frame is smaller or larger. But that's besides the merging operation.

python python-3.x pandas dataframe merge

Possibly less hacky is the following:

Each dataframe replicates rows by the length of the other orgiinal dataframeThe first one is ordered by the 'a' column, but you could adjust thatThen the two dataframes are concatenated along hte column axis (1) to achieve the desired result.

def merge_expand(*args):    tmp_big = pd.concat([args[0]] * len(small), ignore_index=True).sort_values(by=['a']).reset_index(drop=True)    tmp_small = pd.concat([args[1]] * len(big), ignore_index=True)    return pd.concat([tmp_big, tmp_small], 1)

Input:

merge_expand(big, small)

Output:

   a  b  id val0  1  4  aa   a1  1  4  bb   b2  2  5  aa   a3  2  5  bb   b4  3  6  aa   a5  3  6  bb   b

EDIT: We can even make it a bit more generic if you want to pass a few arguments:

def merge_expand(*args):    if len(args) == 2:        if len(args[0]) > len(args[1]):            df_1 = pd.concat([args[0]] * len(args[1]), ignore_index=True).sort_values(by=[args[0].columns[0]]).reset_index(drop=True)            df_2 = pd.concat([args[1]] * len(args[0]), ignore_index=True)            return pd.concat([df_1, df_2], 1)

CodeHunter

Merge a copy of one pandas DataFrame into every row of another DataFrame?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last