Merge a copy of one pandas DataFrame into every row of another DataFrame? Merge a copy of one pandas DataFrame into every row of another DataFrame? pandas pandas

Merge a copy of one pandas DataFrame into every row of another DataFrame?


It looks like you are looking for a full join / cartesian join. It can be accomplished with pd.merge if we assign the same key to all observations.

big.assign(key=1).merge(small.assign(key=1), how='outer', on='key')

Output

   a  b  key  id val0  1  4    1  aa   a1  1  4    1  bb   b2  2  5    1  aa   a3  2  5    1  bb   b4  3  6    1  aa   a5  3  6    1  bb   b

If you already have a columns called 'key', you can essentially call it anything:

big['thiswontmatchanything'] = 1small['thiswontmatchanything'] = 1big.merge(small, how='outer', on='thiswontmatchanything').drop('thiswontmatchanything', axis=1)

Output

    a   b   id  val0   1   4   aa  a1   1   4   bb  b2   2   5   aa  a3   2   5   bb  b4   3   6   aa  a5   3   6   bb  b


I believe there is a much shorter way. Given data frames df1 and df2, you could do

df = df1.merge(df2, how='cross')

or

df = df2.merge(df1, how='cross')

You could potentially implement a simple if-then-else to figure out which data frame is smaller or larger. But that's besides the merging operation.


Possibly less hacky is the following:

Each dataframe replicates rows by the length of the other orgiinal dataframeThe first one is ordered by the 'a' column, but you could adjust thatThen the two dataframes are concatenated along hte column axis (1) to achieve the desired result.

def merge_expand(*args):    tmp_big = pd.concat([args[0]] * len(small), ignore_index=True).sort_values(by=['a']).reset_index(drop=True)    tmp_small = pd.concat([args[1]] * len(big), ignore_index=True)    return pd.concat([tmp_big, tmp_small], 1)

Input:

merge_expand(big, small)

Output:

   a  b  id val0  1  4  aa   a1  1  4  bb   b2  2  5  aa   a3  2  5  bb   b4  3  6  aa   a5  3  6  bb   b

EDIT: We can even make it a bit more generic if you want to pass a few arguments:

def merge_expand(*args):    if len(args) == 2:        if len(args[0]) > len(args[1]):            df_1 = pd.concat([args[0]] * len(args[1]), ignore_index=True).sort_values(by=[args[0].columns[0]]).reset_index(drop=True)            df_2 = pd.concat([args[1]] * len(args[0]), ignore_index=True)            return pd.concat([df_1, df_2], 1)