Running a standalone Hadoop application on multiple CPU cores Running a standalone Hadoop application on multiple CPU cores hadoop hadoop

Running a standalone Hadoop application on multiple CPU cores


I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.

Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.

Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine


Just for clarification...If hadoop runs in local mode you don't have parallel execution on a task level (except you're running >= hadoop 0.21 (MAPREDUCE-1367)). Though you can submit multiple jobs at once and these getting executed in parallel then.

All those

mapred.tasktracker.{map|reduce}.tasks.maximum

properties do only apply to the hadoop running in distributed mode!

HTHJoahnnes


According to this thread on the hadoop.core-user email list, you'll want to change the mapred.tasktracker.tasks.maximum setting to the max number of tasks you would like your machine to handle (which would be the number of cores).

This (and other properties you may want to configure) is also documented in the main documentation on how to setup your cluster/daemons.