Concatenate two PySpark dataframes Concatenate two PySpark dataframes python python

Concatenate two PySpark dataframes


Maybe you can try creating the unexisting columns and calling union (unionAll for Spark 1.6 or lower):

from pyspark.sql.functions import litcols = ['id', 'uniform', 'normal', 'normal_2']    df_1_new = df_1.withColumn("normal_2", lit(None)).select(cols)df_2_new = df_2.withColumn("normal", lit(None)).select(cols)result = df_1_new.union(df_2_new)


df_concat = df_1.union(df_2)

The dataframes may need to have identical columns, in which case you can use withColumn() to create normal_1 and normal_2


You can use unionByName to make this:

df = df_1.unionByName(df_2)

unionByName is available since Spark 2.3.0.