How to convert Row to json in Spark 2 Scala
You can use getValuesMap
to convert the row object to a Map and then convert it JSON:
import scala.util.parsing.json.JSONObjectimport org.apache.spark.sql._val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C") val row = df.first() // this is an example row objectdef convertRowToJSON(row: Row): String = { val m = row.getValuesMap(row.schema.fieldNames) JSONObject(m).toString()}convertRowToJSON(row)// res46: String = {"A" : 1, "B" : 2, "C" : 3}
I need to read json input and produce json output.Most fields are handled individually, but a few json sub objects need to just be preserved.
When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.
But I need to take some sub json structures out to a string to use as a new field.
This can be done like this:
dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))
location.address
is the path to the sub json object of the incoming json based dataframe. address_json
is the column name of that object converted to a string version of the json.
to_json
is implemented in Spark 2.1.
If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.
Pay attention scala class scala.util.parsing.json.JSONObject is deprecated and not support null values.
@deprecated("This class will be removed.", "2.11.0")
"JSONFormat.defaultFormat doesn't handle null values"