Differences between hflush & hsync api's in HDFS Differences between hflush & hsync api's in HDFS hadoop hadoop

Differences between hflush & hsync api's in HDFS


In the current HDFS(0.23.3) implementation, hflush and hsync is the same. hsync invokes hflush. hflush guarantees that flushed data become visible to new readers. It is not guaranteed that data has been flushed to persistent store on the datanode. So using hflush may lost some data if the datanode failures happen. hsync is designed to guarantee that all data write to the disk device but is not implemented now.

In the alpha HDFS 2.0.*, hsync is implemented correctly.

You can get more details in HBase, HDFS and durable sync.