SPARK SQL - case when then SPARK SQL - case when then sql sql

SPARK SQL - case when then


Before Spark 1.2.0

The supported syntax (which I just tried out on Spark 1.0.2) seems to be

SELECT IF(1=1, 1, 0) FROM table

This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.

Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.

Update for Spark 1.2.0 and beyond

As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:

SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData

Update for most recent place to figure out syntax from the SQL Parser

The parser source can now be found here.

Update for more complex examples

In response to a question below, the modern syntax supports complex Boolean conditions.

SELECT    CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdReduxFROM customer

You can involve multiple columns in the condition.

SELECT    CASE WHEN id = 1 OR state = 'MA'          THEN "OneOrMA"          ELSE "NotOneOrMA" END AS IdReduxFROM customer

You can also nest CASE WHEN THEN expression.

SELECT    CASE WHEN id = 1          THEN "OneOrMA"         ELSE             CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END    END AS IdReduxFROM customer


For Spark 2.+Spark when function

From documentation:

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

 // Example: encoding gender string column into integer.   // Scala:   people.select(when(people("gender") === "male", 0)     .when(people("gender") === "female", 1)     .otherwise(2))   // Java:   people.select(when(col("gender").equalTo("male"), 0)     .when(col("gender").equalTo("female"), 1)     .otherwise(2))


This syntax worked for me in Databricks:

  select     org,     patient_id,    case       when (age is null) then 'Not Available'      when (age < 15) then 'Less than 15'      when (age >= 15 and age < 25) then '15 to 25'      when (age >= 25 and age < 35) then '25 to 35'      when (age >= 35 and age < 45) then '35 to 45'      when (age >= 45) then '45 and Older'    end as age_range  from demo