Block Replication Limits in HDFS

hadoop hdfs

The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.

P.s. These values were not tried by me on production systems personally. You will also not want to max out the re-replication workload such that it affects regular cluster work, as recovery of 1/3 replicas may be of lesser priority than missing job/query SLAs due to lack of network resources (unless you have a really fast network that's always under-utilised even under loaded periods). Try to tune it till you're satisfied with the results.

CodeHunter

Block Replication Limits in HDFS

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last