Hadoop: specify yarn queue for distcp Hadoop: specify yarn queue for distcp hadoop hadoop

Hadoop: specify yarn queue for distcp


You are committing a mistake in the specification of the parameter.

You should not use ":" for separating the key/value pairs. You should use "=".

The command should be

 hadoop distcp -Dmapred.job.queue.name=root.default .......


-Dmapreduce.job.queuename=root.default


Similarly, hadoop archive can be instructed to target a custom queue :

hadoop archive -Dmapreduce.job.queuename='<leaf.queue.name> ...

I take the opporunity of this response to give a tip for hadoop archive:as it will create one map task per file to create (by default, the destination file size is 2GB). This can lead to thousands of maps when archiving terabytes of data.

The size of part-* files of hadoop archives is controlled with undocumented har.partfile.size : you can increase it by setting a value (in bytes) higher than 2GiB with -Dhar.partfile.size=<value in bytes>