Concatenate two PySpark dataframes

Maybe you can try creating the unexisting columns and calling union (unionAll for Spark 1.6 or lower):

from pyspark.sql.functions import litcols = ['id', 'uniform', 'normal', 'normal_2']    df_1_new = df_1.withColumn("normal_2", lit(None)).select(cols)df_2_new = df_2.withColumn("normal", lit(None)).select(cols)result = df_1_new.union(df_2_new)

python apache-spark pyspark

df_concat = df_1.union(df_2)

The dataframes may need to have identical columns, in which case you can use withColumn() to create normal_1 and normal_2

python apache-spark pyspark

You can use unionByName to make this:

df = df_1.unionByName(df_2)

unionByName is available since Spark 2.3.0.

CodeHunter

Concatenate two PySpark dataframes

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last