Select rows from a pandas dataframe with a numpy 2D array on multiple columns Select rows from a pandas dataframe with a numpy 2D array on multiple columns pandas pandas

Select rows from a pandas dataframe with a numpy 2D array on multiple columns


Is there any reason not to use merge ?

df2 = pd.DataFrame(M, columns=geocols) df = df.merge(df2, how='outer')ix = df.score.isnull()df.loc[ix, 'score'] = df.loc[ix].apply(func, axis=1)

It does exactly what you proposed : adds the missing rows df with a nan score, identifies nans, calculates the scores for those rows.


So this solution does loop over each row in M, but not each element. The steps are:

  1. Go through each row in M and identify if it is in df or not. If it is in it, save the index. If it is not, calculate the score and save.
  2. Create the M dataframe by taking the new M rows from above and appending the rows found in df.
  3. Create the new version of the dataframe by just appending the new rows of M.

Hopefully this helps - I realise it still has a loop in it but I have not figured out how to get rid of it. Your question also only states that df could be big, and that you wanted to avoid looping elements of M, which this at least avoids by only looping rows.


M_in_df = []M_not_in_df = []for m in M:    df_index = (df.iloc[:,:4].values == m).all(axis=1)    if df_index.any():        M_in_df.append(np.argmax(df_index))    else:        M_not_in_df.append(np.append(m, func(m)))    M_df = pd.DataFrame(M_not_in_df, columns=df.columns).append(df.iloc[M_in_df], ignore_index=True)new_df = df.append(pd.DataFrame(M_not_in_df, columns=df.columns), ignore_index=True)


Convert M to a DataFrame, concat with df:

df2 = pd.DataFrame(M, columns=geocols)df3 = pd.concat([df, df2], ignore_index=True)

Drop the duplicate rows only based on the cols in geocols:

df3 = df3.drop_duplicates(subset=geocols)

Get a mask of rows with NaN for score:

m = df3.score.isnull()

Apply the score to the masked rows, and store in df3:

df3.loc[m, 'score'] = df3[m].apply(func, axis=1)

You will get a SettingWithCopyWarning, but it works.