Merge a large Dask dataframe with a small Pandas dataframe Merge a large Dask dataframe with a small Pandas dataframe python python

Merge a large Dask dataframe with a small Pandas dataframe


you can iterate over unique equal values and assign other columns with loop:

unioun_set = list(set(small_df['common_column']) & set(large_df['common_column']))for el in union_set:    for column in small_df.columns:        if column not in large_df.columns:            large_df.loc[large_df['common_column'] == el,column] = small_df.loc[small_df['common_column'] ==  el,column]


While working with big data, partitioning data is very important at the same time having enough cluster and memory size is mandatory.

You can try using spark.

DASK is a pure Python framework, which does more of same i.e. it allows one to run the same Pandas or NumPy code either locally or on a cluster. Whereas, Apache Spark brings about a learning curve involving a new API and execution model although with a Python wrapper.

You can try partitioning data and storing it into parquet files.