How can I print nulls when converting a dataframe to json in Spark
To print the null values in JSON using Spark's toJSON
method, you can use following code:
myData.na.fill("null").toJSON
It will give you expected result:
+-------------------------------------------+|value |+-------------------------------------------+|{"name":"Alice","age":"23","pets":"dog"} ||{"name":"Bob","age":"30","pets":"dog"} ||{"name":"Charlie","age":"35","pets":"null"}|+-------------------------------------------+
I hope it helps!
I have modified JacksonGenerator.writeFields function and included in my project.Below are the steps-
1) Create package 'org.apache.spark.sql.catalyst.json' inside 'src/main/scala/'
2) Copy JacksonGenerator class
3) Create JacksonGenerator.scala class in '' package and paste the copied code
4) modify writeFields function
private def writeFields(row: InternalRow, schema: StructType, fieldWriters:Seq[ValueWriter]): Unit = {var i = 0while (i < row.numFields) { val field = schema(i) if (!row.isNullAt(i)) { gen.writeFieldName(field.name) fieldWriters(i).apply(row, i) } else{ gen.writeNullField(field.name) } i += 1}}
import org.apache.spark.sql.types._import scala.util.parsing.json.JSONObjectdef convertRowToJSON(row: Row): String = { val m = row.getValuesMap(row.schema.fieldNames).filter(_._2 != null) JSONObject(m).toString() }