What is the Scala type mapping for all Spark SQL DataType What is the Scala type mapping for all Spark SQL DataType sql sql

What is the Scala type mapping for all Spark SQL DataType


Directly from the Spark SQL and DataFrame Guide:

Data type       |    Value type in Scala------------------------------------------------ByteType        |    Byte   ShortType       |    Short  IntegerType     |    Int    LongType        |    Long   FloatType       |    Float  DoubleType      |    Double     DecimalType     |    java.math.BigDecimalStringType      |    StringBinaryType      |    Array[Byte]BooleanType     |    Boolean TimestampType   |    java.sql.TimestampDateType        |    java.sql.DateArrayType       |    scala.collection.Seq   MapType         |    scala.collection.Map   StructType      |    org.apache.spark.sql.Row


For those trying to find the Java types, they're now also hosted at the link from zero323's answer. To document the current revision here:

Data type     |    Value type in Java              |    API to access or create a data type-------------------------------------------------------------------------------------------ByteType      |    byte or Byte                    |    DataTypes.ByteTypeShortType     |    short or Short                  |    DataTypes.ShortTypeIntegerType   |    int or Integer                  |    DataTypes.IntegerTypeLongType      |    long or Long                    |    DataTypes.LongTypeFloatType     |    float or Float                  |    DataTypes.FloatTypeDoubleType    |    double or Double                |    DataTypes.DoubleTypeDecimalType   |    java.math.BigDecimal            |    DataTypes.createDecimalType() or DataTypes.createDecimalType(precision, scale).StringType    |    String                          |    DataTypes.StringTypeBinaryType    |    byte[]                          |    DataTypes.BinaryTypeBooleanType   |    boolean or Boolean              |    DataTypes.BooleanTypeTimestampType |    java.sql.Timestamp              |    DataTypes.TimestampTypeDateType      |    java.sql.Date                   |    DataTypes.DateTypeArrayType     |    java.util.List                  |    DataTypes.createArrayType(elementType) or DataTypes.createArrayType(elementType, containsNull).MapType       |    java.util.Map                   |    DataTypes.createMapType(keyType, valueType) or DataTypes.createMapType(keyType, valueType, valueContainsNull)StructType    |    org.apache.spark.sql.Row        |    DataTypes.createStructType(fields)StructField   |    The value type in Java of the   |    DataTypes.createStructField(name, dataType, nullable)              |    data type of this field (For    |              |    example, int for a StructField  |              |    with the data type IntegerType) |

One thing of note when working with StructTypes in particular - it appears that, if you wish to declare an empty StructType in another as a placeholder value, you must use a new StructType() rather than the suggested DataTypes.createStructType((StructField)null) to prevent null pointers. Remember to instantiate the nested StructType with StructFields prior to usage.