Advantages of Sequence file over hdfs textfile Advantages of Sequence file over hdfs textfile hadoop hadoop

Advantages of Sequence file over hdfs textfile


  1. Sequence files are appropriate for situations in which you want to store keys and their corresponding values. For text files you can do that but you have to parse each line.
  2. Can be compressed and still be splittable which means better workload. You can't split a compressed text file unless you use a splittable compression format.
  3. Can be approached as binary files => more storage efficient. In a text file a double will be a number of chars => large storage overhead.


Advantages of Hadoop Sequence files ( As per Siva's article from hadooptutorial.info website)

  1. More compact than text files
  2. Provides support for compression at different levels - Block or Record etc.
  3. Files can be split and processed in parallel
  4. They can solve large number of small files problem in Hadoop where Hadoop main advantage is processing large file with Map reduce jobs. It can be used as a container for large number of small files
  5. Temporary output of Mapper can be stored in sequential files

Disadvantages:

  1. Sequential files are append only


Sequence files are intermediate files generated during mapper and reducer phase of MapReduce processing. Sequence file are compressible and fast in processing it is used to write output during mapper and reducer reds from it.There are APIs in Hadoop and Spark to read/write sequence files