Concatenate two PySpark dataframes
Maybe you can try creating the unexisting columns and calling union
(unionAll
for Spark 1.6 or lower):
from pyspark.sql.functions import litcols = ['id', 'uniform', 'normal', 'normal_2'] df_1_new = df_1.withColumn("normal_2", lit(None)).select(cols)df_2_new = df_2.withColumn("normal", lit(None)).select(cols)result = df_1_new.union(df_2_new)
df_concat = df_1.union(df_2)
The dataframes may need to have identical columns, in which case you can use withColumn()
to create normal_1
and normal_2
You can use unionByName to make this:
df = df_1.unionByName(df_2)
unionByName is available since Spark 2.3.0.