Block Replication Limits in HDFS Block Replication Limits in HDFS hadoop hadoop

Block Replication Limits in HDFS


The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.

P.s. These values were not tried by me on production systems personally. You will also not want to max out the re-replication workload such that it affects regular cluster work, as recovery of 1/3 replicas may be of lesser priority than missing job/query SLAs due to lack of network resources (unless you have a really fast network that's always under-utilised even under loaded periods). Try to tune it till you're satisfied with the results.