Pyspark convert a standard list to data frame [duplicate] Pyspark convert a standard list to data frame [duplicate] python python

Pyspark convert a standard list to data frame [duplicate]


This solution is also an approach that uses less code, avoids serialization to RDD and is likely easier to understand:

from pyspark.sql.types import IntegerType# notice the variable name (more below)mylist = [1, 2, 3, 4]# notice the parens after the type namespark.createDataFrame(mylist, IntegerType()).show()

NOTE: About naming your variable list: the term list is a Python builtin function and as such, it is strongly recommended that we avoid using builtin names as the name/label for our variables because we end up overwriting things like the list() function. When prototyping something fast and dirty, a number of folks use something like: mylist.


Please see the below code:

    from pyspark.sql import Row    li=[1,2,3,4]    rdd1 = sc.parallelize(li)    row_rdd = rdd1.map(lambda x: Row(x))    df=sqlContext.createDataFrame(row_rdd,['numbers']).show()

df

+-------+|numbers|+-------+|      1||      2||      3||      4|+-------+