pyspark: ValueError: Some of types cannot be determined after inferring

python python-2.7 pandas pyspark spark-dataframe

In order to infer the field type, PySpark looks at the non-none records in each field. If a field only has None records, PySpark can not infer the type and will raise that error.

Manually defining a schema will resolve the issue

>>> from pyspark.sql.types import StructType, StructField, StringType>>> schema = StructType([StructField("foo", StringType(), True)])>>> df = spark.createDataFrame([[None]], schema=schema)>>> df.show()+----+|foo |+----+|null|+----+

python python-2.7 pandas pyspark spark-dataframe

And to fix this problem, you could provide your own defined schema.

For example:

To reproduce the error:

>>> df = spark.createDataFrame([[None, None]], ["name", "score"])

To fix the error:

>>> from pyspark.sql.types import StructType, StructField, StringType, DoubleType>>> schema = StructType([StructField("name", StringType(), True), StructField("score", DoubleType(), True)])>>> df = spark.createDataFrame([[None, None]], schema=schema)>>> df.show()+----+-----+|name|score|+----+-----+|null| null|+----+-----+

python python-2.7 pandas pyspark spark-dataframe

If you are using the RDD[Row].toDF() monkey-patched method you can increase the sample ratio to check more than 100 records when inferring types:

# Set sampleRatio smaller as the data size increasesmy_df = my_rdd.toDF(sampleRatio=0.01)my_df.show()

Assuming there are non-null rows in all fields in your RDD, it will be more likely to find them when you increase the sampleRatio towards 1.0.

CodeHunter

pyspark: ValueError: Some of types cannot be determined after inferring

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last