Create DataFrame from list of tuples using pyspark

Hey could you next time provide a working example. That would be easier.

The way how your RDD is presented is basically weird to create a DataFrame. This is how you create a DF according to Spark Documentation.

>>> l = [('Alice', 1)]>>> sqlContext.createDataFrame(l).collect()[Row(_1=u'Alice', _2=1)]>>> sqlContext.createDataFrame(l, ['name', 'age']).collect()[Row(name=u'Alice', age=1)]

So concerning your example you can create your desired output like this way:

# Your data at the momentdata = sc.parallelize([ [('Id', 'a0w1a0000003xB1A'), ('PackSize', 1.0), ('Name', 'A')],[('Id', 'a0w1a0000003xAAI'), ('PackSize', 1.0), ('Name', 'B')],[('Id', 'a0w1a00000xB3AAI'), ('PackSize', 30.0), ('Name', 'C')]    ])# Convert to tupledata_converted = data.map(lambda x: (x[0][1], x[1][1], x[2][1]))# Define schemaschema = StructType([    StructField("Id", StringType(), True),    StructField("Packsize", StringType(), True),    StructField("Name", StringType(), True)])# Create dataframeDF = sqlContext.createDataFrame(data_converted, schema)# OutputDF.show()+----------------+--------+----+|              Id|Packsize|Name|+----------------+--------+----+|a0w1a0000003xB1A|     1.0|   A||a0w1a0000003xAAI|     1.0|   B||a0w1a00000xB3AAI|    30.0|   C|+----------------+--------+----+

Hope this helps

CodeHunter

Create DataFrame from list of tuples using pyspark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last