MapR Architecture Vs Cloudera Architecture MapR Architecture Vs Cloudera Architecture hadoop hadoop

MapR Architecture Vs Cloudera Architecture


Good information by @JamCon in his reply, but there are some things worth clarifying:

The comment regarding patches is not accurate. MapR packages a broad range of Hadoop projects in its distribution so you don't have to separately compile anything. And MapR has the same APIs as any other distro, meaning their packages are not about compatibility but are simply bug fixes / enhancements from the community. There's typically no extra work required to get Hadoop ecosystem projects to run on MapR. And they release ecosystem updates at least once a month, as far as I can tell, to keep current with new enhancements.

Regarding the inclusion of YARN, we've been running MapR on YARN across large clusters since July '14! I believe MapR has their own ecosystem project vetting process, and they graduate MapR packaged versions to GA once they determine a project is ready for enterprise support.


MapR deviates from the vanilla Hadoop & CDH distributions a bit. It keeps most of the services and structure (Job Tracker, Data Nodes, HBase Master & Region, MR, etc), but there are some significant differences.

One of the defining items about MapR's distribution is that it doesn't use HDFS. It has its own custom FS, which features HA and operates without Name Nodes (via distributed metadata). It also allowed them to enable NFS access years ahead of the rest of the Hadoop distros, as well as snap shotting.

The custom FS does complicate their distribution a bit, though ... for example, when you want to run products or services, you often need to install the MapR specific patches. When you want to run mahout, you need to compile it with the MapR patches from https://github.com/mapr/mahout. But it also gives them an opportunity to incorporate better security at the FS level, as seen by the implementation of "Access Control Expressions" and Cluster/Job/Volume ACLs.

Overall, it's a well structured product. My biggest concern is they've deviated so far from the norm that when new innovations are adopted, they're slow to adapt, because it has to be incorporated into their highly modified environment. YARN is a perfect example ... they haven't released it yet, even though their competitors have.


From an architecture stand point with MapR there are no master nodes. The functions that the master nodes provide in a typical Hadoop architecture are instead distributed and performed within the "data nodes" of MapR.

https://www.mapr.com/why-hadoop/why-mapr/architecture-matters