Why we need Avro schema evolution

hadoop avro

If you have one avro file and you want to change its schema, you can rewrite that file with a new schema inside. But what if you have terabytes of avro files and you want to change their schema? Will you rewrite all of the data, every time the schema changes?

Schema evolution allows you to update the schema used to write new data, while maintaining backwards compatibility with the schema(s) of your old data. Then you can read it all together, as if all of the data has one schema. Of course there are precise rules governing the changes allowed, to maintain compatibility. Those rules are listed under Schema Resolution.

There are other use cases for reader and writer schemas, beyond evolution. You can use a reader as a filter. Imagine data with hundreds of fields, of which you are only interested in a handful. You can create a schema for that handful of fields, to read only the data you need. You can go the other way and create a reader schema which adds default data, or use a schema to join the schemas of two different datasets.

Or you can just use one schema, which never changes, for both reading and writing. That's the simplest case.

CodeHunter

Why we need Avro schema evolution

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last