Difference between Avrodata file and Sequence file with respect to Apache sqoop Difference between Avrodata file and Sequence file with respect to Apache sqoop hadoop hadoop

Difference between Avrodata file and Sequence file with respect to Apache sqoop


SequenceFiles are a binary format that store individual records in custom record-specific data types. This format supports exact storage of all data in binary representations, and is appropriate for storing binary data (for example, VARBINARY columns), or data that will be principly manipulated by custom MapReduce programs (reading from SequenceFiles is higher-performance than reading from text files, as records do not need to be parsed).

Avro data files are a compact, efficient binary format that provides interoperability with applications written in other programming languages. Avro also supports versioning, so that when, e.g., columns are added or removed from a table, previously imported data files can be processed along with new ones.

here's a comparison, by Doug Cutting himself:

http://www.quora.com/What-are-the-advantages-of-Avros-object-container-file-format-over-the-SequenceFile-container-format