hadoop vs teradata what is the difference hadoop vs teradata what is the difference hadoop hadoop

hadoop vs teradata what is the difference


I think this article titled 'MapReduce and Parallel DBMSs: Friends or Foes' does quite a good job describing the situations where each technology works best. In a nutshell, Hadoop is excellent for storing unstructured data and running parallel transformations to 'sanitize' incoming data, where DBMSs excel at executing complex queries quickly.


Hadoop, Hadoop with Extensions, RDBMS Feature/Property Comparison

I am not an expert in this area, but in the coursera.com course, Introduction to Data Science, there is a lecture titled: Comparing MapReduce and Databases as well as a lecture on Parallel databases within the map reduce section of the course.

Here is a summary from these lectures on the comparison of MapReduce vs. RDBMS (not necessarily parallel RDMBS).One point to remember is that the comparison is different if you include extensions to Hadoop like PIG, Hive, etc. I will put in () MapReduce extensions that add some of these functionality/properties.

Some functionality/properties that RDBMS have but not native MapReduce:

  • Declaritive query languages -(Pig, HIVE)
  • Schemas (Hive, Pig, DyradLINQ, Hadapt)
  • Logical Data Independence
  • Indexing (Hbase)
  • Algebraic Optimization (Pig, Dryad, HIVE)
  • Caching/Materialized Views
  • ACID/Transactions

MapReduce (relative to regular RDBMS not necessarily Parallel RDMBS)

  • High Scalability
  • Fault-tolerance
  • “One-person deployment”


To Begin with, Vanilla Apache Hadoop is 100% open source. But if you need commercial support along with consultancy there are companies like Cloudera, MapR, HortonWorks, etc.

Hadoop is backed by a growing community fixing bugs and making improvements on a consistent basis. Hadoop storage model HDFS is based on Google's GFS architecture which is proven to handle large quantities of data. Furthermore Hadoop analysis model Map Reduce is based on Google's Map Reduce Model.

Hadoop is used by Tech Giants like Facebook, Yahoo, Twitter, EBay etc to store and analysis they high volume of data real time as well as passively.

For your question ETL systems read these slides where you will see.

Ok now Why Hadoop?

  1. Open Source
  2. Proven Storage and Analysis model for Large Quantities of data
  3. Minimum Hardware Requirement to setup and run.

Ok now Why TD?

  1. Commercial Support