How to change a dataframe column from String type to Double type in PySpark?

python apache-spark dataframe pyspark apache-spark-sql

There is no need for an UDF here. Column already provides cast method with DataType instance :

from pyspark.sql.types import DoubleTypechangedTypedf = joindf.withColumn("label", joindf["show"].cast(DoubleType()))

or short string:

changedTypedf = joindf.withColumn("label", joindf["show"].cast("double"))

where canonical string names (other variations can be supported as well) correspond to simpleString value. So for atomic types:

from pyspark.sql import types for t in ['BinaryType', 'BooleanType', 'ByteType', 'DateType',           'DecimalType', 'DoubleType', 'FloatType', 'IntegerType',            'LongType', 'ShortType', 'StringType', 'TimestampType']:    print(f"{t}: {getattr(types, t)().simpleString()}")

BinaryType: binaryBooleanType: booleanByteType: tinyintDateType: dateDecimalType: decimal(10,0)DoubleType: doubleFloatType: floatIntegerType: intLongType: bigintShortType: smallintStringType: stringTimestampType: timestamp

and for example complex types

types.ArrayType(types.IntegerType()).simpleString()

'array<int>'

types.MapType(types.StringType(), types.IntegerType()).simpleString()

'map<string,int>'

python apache-spark dataframe pyspark apache-spark-sql

Preserve the name of the column and avoid extra column addition by using the same name as input column:

from pyspark.sql.types import DoubleTypechangedTypedf = joindf.withColumn("show", joindf["show"].cast(DoubleType()))

python apache-spark dataframe pyspark apache-spark-sql

Given answers are enough to deal with the problem but I want to share another way which may be introduced the new version of Spark (I am not sure about it) so given answer didn't catch it.

We can reach the column in spark statement with col("colum_name") keyword:

from pyspark.sql.functions import colchangedTypedf = joindf.withColumn("show", col("show").cast("double"))

CodeHunter

How to change a dataframe column from String type to Double type in PySpark?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last