use of "default" in avro schema use of "default" in avro schema hadoop hadoop

use of "default" in avro schema


I think there is some miss understanding around default values so hopefully my explanation will help to other people as well. The default value is useful to give a default value when the field is not present, but this is essentially when you are instancing an avro object (in your case calling datumReader.read) but it does not allow read data with a different schema, this is why the concept of "schema registry" is useful for this kind of situations.

The following code works and allow read your data

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);Schema expected = new Schema.Parser().parse("{\n" +        "  \"type\": \"record\",\n" +        "  \"namespace\": \"com.example\",\n" +        "  \"name\": \"Student\",\n" +        "  \"fields\": [{\n" +        "    \"name\": \"age\",\n" +        "    \"type\": \"int\",\n" +        "    \"default\": -1\n" +        "  }\n" +        "  ]\n" +        "}");datumReader.setSchema(expected);System.out.println(datumReader.read(null, decoder));

as you can see, I am specifying the schema used to "write" the json input which does not contain the field "name", however (considering your schema contains a default value) when you print the records you will see the name with your default value

{"age": 70, "name": "null"}

Just in case, might or might not already know, that "null" is not really a null value is a string with value "null".


Just to add what is already said in above answer. in order for a field to be null if not present. then union its type with null. otherwise its just a string which is spelled as null that gets in.example schema:

{"name": "name","type": [  "null",  "string"],"default": null

}

and then if you add {"age":70} and retrieve the record, you will get below:

{"age":70,"name":null}