What does the sync and syncFs of SequenceFile.Writer means? What does the sync and syncFs of SequenceFile.Writer means? hadoop hadoop

What does the sync and syncFs of SequenceFile.Writer means?


Yes, probably.

sync() create a sync point. As stated in the book "Hadoop- The Definitive Guide" by Tom White (Cloudera)

a sync point is a point in the stream which can used by to resynchronize with a record boundary if the reader is "lost" - for example after seeking to an arbitrary position on the stream.

Now the implementation of syncFS() is pretty simple:

   public void syncFs() throws IOException {      if (out != null) {        out.sync();                               // flush contents to file system      }    }

where out is a FSDataOutputStream. Again, in the same book is stated:

HDFS provides a method for forcing all buffers to be synchronized to the datanodes via the sync() method on FSDataOutputStream. After a successful call return from sync() HDFS garantees that the data written up to that point in the file is persisted and visible to all readers. In the event of a crash (of the client or HDFS), the data will not be lost.

But a footnote warns to look to bug HDFS-200, since the visibility mentioned above was not always not always honored.