What does the sync and syncFs of SequenceFile.Writer means?
Yes, probably.
sync()
create a sync point. As stated in the book "Hadoop- The Definitive Guide" by Tom White (Cloudera)
a sync point is a point in the stream which can used by to resynchronize with a record boundary if the reader is "lost" - for example after seeking to an arbitrary position on the stream.
Now the implementation of syncFS()
is pretty simple:
public void syncFs() throws IOException { if (out != null) { out.sync(); // flush contents to file system } }
where out
is a FSDataOutputStream
. Again, in the same book is stated:
HDFS provides a method for forcing all buffers to be synchronized to the datanodes via the
sync()
method onFSDataOutputStream
. After a successful call return fromsync()
HDFS garantees that the data written up to that point in the file is persisted and visible to all readers. In the event of a crash (of the client or HDFS), the data will not be lost.
But a footnote warns to look to bug HDFS-200, since the visibility mentioned above was not always not always honored.