MapR Architecture Vs Cloudera Architecture

hadoop architecture cloudera infrastructure mapr

Good information by @JamCon in his reply, but there are some things worth clarifying:

The comment regarding patches is not accurate. MapR packages a broad range of Hadoop projects in its distribution so you don't have to separately compile anything. And MapR has the same APIs as any other distro, meaning their packages are not about compatibility but are simply bug fixes / enhancements from the community. There's typically no extra work required to get Hadoop ecosystem projects to run on MapR. And they release ecosystem updates at least once a month, as far as I can tell, to keep current with new enhancements.

Regarding the inclusion of YARN, we've been running MapR on YARN across large clusters since July '14! I believe MapR has their own ecosystem project vetting process, and they graduate MapR packaged versions to GA once they determine a project is ready for enterprise support.

hadoop architecture cloudera infrastructure mapr

MapR deviates from the vanilla Hadoop & CDH distributions a bit. It keeps most of the services and structure (Job Tracker, Data Nodes, HBase Master & Region, MR, etc), but there are some significant differences.

One of the defining items about MapR's distribution is that it doesn't use HDFS. It has its own custom FS, which features HA and operates without Name Nodes (via distributed metadata). It also allowed them to enable NFS access years ahead of the rest of the Hadoop distros, as well as snap shotting.

The custom FS does complicate their distribution a bit, though ... for example, when you want to run products or services, you often need to install the MapR specific patches. When you want to run mahout, you need to compile it with the MapR patches from https://github.com/mapr/mahout. But it also gives them an opportunity to incorporate better security at the FS level, as seen by the implementation of "Access Control Expressions" and Cluster/Job/Volume ACLs.

Overall, it's a well structured product. My biggest concern is they've deviated so far from the norm that when new innovations are adopted, they're slow to adapt, because it has to be incorporated into their highly modified environment. YARN is a perfect example ... they haven't released it yet, even though their competitors have.

hadoop architecture cloudera infrastructure mapr

From an architecture stand point with MapR there are no master nodes. The functions that the master nodes provide in a typical Hadoop architecture are instead distributed and performed within the "data nodes" of MapR.

https://www.mapr.com/why-hadoop/why-mapr/architecture-matters

CodeHunter

MapR Architecture Vs Cloudera Architecture

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last