Threads in Hadoop

java multithreading hadoop cpu

1) I have 2 CPU's, As per my understanding my system will be able to serve 2 threads at a time, what is the use in setting the datanode / namenode handler to higher value like '10'?

Most of the time, these threads will be blocked(asleep) waiting IO operation. Assume on average 1 thread is asleep for 99.9% of the time, then it only consumes 0.1% cpu. You can easily run 1000 threads at the same time. In production, the threads configuration should be based on the cluster setup (pysical cores per node, disk throughput, network throughput, workload, etc.) If you are not sure, just use the default values.

2) What is the difference between handler count and maximum transfer thread both are used for processing?

dfs.datanode.handler.count is the handler threads for ClientDatanodeProtocol, which is used for client/DN RPC communicates information about block recovery meta info. The message size is small and the transfer is fast, the handler will be idle for most of the time, so we don't need much handlers. We can easily reuse the idle one. So the default value is 10 which is quite smaller than transfer.threads.

dfs.datanode.max.transfer.threads is the number of DataXceiver threads, which is used for transfering blocks via the DTP (data transfer protocol). The block data is big and the transfer takes some time. 1 thread will be served for one block reading. only until the whole block is transferred, the thread can be reused. If there's many clients request block at the same time, we need more threads. For each write connection, there will be 2 threads. So this number should be larger for write bound applications.

Actually DataXceiver threads will be blocked waiting for reading from disk or waiting for sending data through interface. So it doesn't consume much cpu except data checksums computation.

CodeHunter

Threads in Hadoop

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last