What is the difference between single node & pseudo-distributed mode in Hadoop? What is the difference between single node & pseudo-distributed mode in Hadoop? hadoop hadoop

What is the difference between single node & pseudo-distributed mode in Hadoop?


My 2 cents.

Single node setup (standalone setup)

By default, Hadoop is configured to run in a non-distributed or standalone mode, as a single Java process. There are no daemons running and everything runs in a single JVM instance. HDFS is not used.

You don't have to do anything as far as configuration is concerned, except the JAVA_HOME. Just download the tarball, unzip it, and you are good to go.

Pseudo-distributed mode

The Hadoop daemons run on a local machine, thus simulating a cluster on a small scale. Different Hadoop daemons run in different JVM instances, but on a single machine. HDFS is used instead of local FS.

As far as pseudo-distributed setup is concerned, you need to set at least following 2 properties along with JAVA_HOME:

  1. fs.default.name in core-site.xml.

  2. mapred.job.tracker in mapred-site.xml.

You could have multiple datanodes and tasktrackers, but that doesn't make much sense on a single machine.

HTH


A single node setup is one where you have (presumably) one datanode and one tasktracker on a single machine.

A pseudo-distributed setup is where you have multiple datanodes and (presumably) tasktrackers on a single machine. So you have multiple instances of a datanode service running on a single machine to emulate a multi-node cluster.