How to append ORC file How to append ORC file hadoop hadoop

How to append ORC file


ORC data files are subdivised in independent stripes; each stripe be created in a single atomic step. See the official documentation for details.

I don't believe you can directly append to an existing file on-the-fly. That would mean leaving a corrupt stripe (hence a corrupt file) in case of a job crash while writing.

But you can

  • create a new ORC data file (which will contain 1..N stripes dependingon actual data volume vs. orc.stripe.size property) per reducer
  • then "concatenate" these data files -- and existing file(s) -- using Hive V0.14 and above