Pyspark convert a standard list to data frame [duplicate]

This solution is also an approach that uses less code, avoids serialization to RDD and is likely easier to understand:

from pyspark.sql.types import IntegerType# notice the variable name (more below)mylist = [1, 2, 3, 4]# notice the parens after the type namespark.createDataFrame(mylist, IntegerType()).show()

NOTE: About naming your variable list: the term list is a Python builtin function and as such, it is strongly recommended that we avoid using builtin names as the name/label for our variables because we end up overwriting things like the list() function. When prototyping something fast and dirty, a number of folks use something like: mylist.

python apache-spark pyspark pyspark-sql

Please see the below code:

    from pyspark.sql import Row    li=[1,2,3,4]    rdd1 = sc.parallelize(li)    row_rdd = rdd1.map(lambda x: Row(x))    df=sqlContext.createDataFrame(row_rdd,['numbers']).show()

+-------+|numbers|+-------+|      1||      2||      3||      4|+-------+

CodeHunter

Pyspark convert a standard list to data frame [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last