Target Replicas is 10 but found 3 replica Target Replicas is 10 but found 3 replica hadoop hadoop

Target Replicas is 10 but found 3 replica


The replication count for files submitted as part of your job (jars, etc) is controlled by the parameter mapreduce.client.submit.file.replication (or mapred.submit.replication in pre 2.4 clusters) in mapred-site.xml. You can adjust this down for clusters that are smaller than 10 nodes, or just ignore the message from fsck.

FWIW, there is a JIRA for this but I doubt it will ever get worked.


You can ignore. /tmp/hadoop-yarn/staging/ubuntu/.staging/job_1450038005671_0025/job.jar, it is a job resource. dfs.replication does not have impact on job resources.

  1. Job resources such as jar files, files passed using -files (distributed cache) will be copied to HDFS using 10 as replication factor
  2. When the job is running, these job resources (code) will be copied to the container/task to process the data
  3. Once the job is completed based up on thresholds these resources will be automatically recycled.

This feature helps in implementing data locality (where code goes to data) while processing the data.


HDFS configuration file hdfs-site.xml should contain dfs.replication property which describes block replication factor:

<configuration>  <property>    <name>dfs.replication</name>    <value>3</value>  </property></configuration>

Default hdfs-site.xml location is /etc/hadoop/hdfs-site.xml