data block size in HDFS, why 64MB? data block size in HDFS, why 64MB? hadoop hadoop

data block size in HDFS, why 64MB?


What does 64MB block size mean?

The block size is the smallest unit of data that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundry, you need a second block.

If yes, what is the advantage of doing that?

HDFS is meant to handle large files. Lets say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to figure out where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, greatly reducing the cost of overhead and load on the Name Node.


HDFS's design was originally inspired by the design of the Google File System (GFS). Here are the two reasons for large block sizes as stated in the original GFS paper (note 1 on GFS terminology vs HDFS terminology: chunk = block, chunkserver = datanode, master = namenode; note 2: bold formatting is mine):

A large chunk size offers several important advantages. First, it reduces clients’ need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. The reduction is especially significant for our workloads because applications mostly read and write large files sequentially. [...] Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadatain memory, which in turn brings other advantages that we will discuss in Section 2.6.1.

Finally, I should point out that the current default size in Apache Hadoop is is 128 MB (see dfs.blocksize).


In HDFS the block size controls the level of replication declustering. The lower the block size your blocks are more evenly distributed across the DataNodes. The higher the block size your data are potentially less equally distributed in your cluster.

So what's the point then choosing a higher block size instead of some low value? While in theory equal distribution of data is a good thing, having a too low blocksize has some significant drawbacks. NameNode's capacity is limited, so having 4KB blocksize instead of 128MB means also having 32768 times more information to store. MapReduce could also profit from equally distributed data by launching more map tasks on more NodeManager and more CPU cores, but in practice theoretical benefits will be lost on not being able to perform sequential, buffered reads and because of the latency of each map task.