MapReduce and SQL GROUP BY

mongodb hadoop group-by mapreduce

What you get by using MR is speed. GROUP BY is a slow operation in SQL and MR is even slower in MongoDB. But what you do is that you create new collections and iterate over them in real time. This is very good when you have large amounts of data and want to be able to iterate over it in real time.

In the project I'm working on there is a Python script running in the background (cron job) doing different map/reduces once per day. Instead of iterating over large tables with SQL group by, we iterate once with MR and then iterate fast on the new collections created.

I have no experience in Hadoop. So I'm sorry I can't fill you in there.

Tutorial:http://www.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/

EDIT:

Here you may see an entire translation of an SQL query to a MongoDB Map/Reduce: GROUP BY to MongoDB Map/Reduce It's taken from: http://rickosborne.org/download/SQL-to-MongoDB.pdf

mongodb hadoop group-by mapreduce

A lot of folk use MongoDB as the data storage and Hadoop for processing as there's connector between the two. Each MongoDB node can handle multiple Hadoop nodes reading into it. As a note, I'd recommend is separating mongo and Hadoop nodes for memory.

In case you don't have them, here's some documents for you

One other thing that might be worth looking at is the new aggregation framework coming out in 2.2. Here's chart equating the operations in SQL with those in the MongoDB aggregation framework.

CodeHunter

MapReduce and SQL GROUP BY

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last