What's the difference between Flume and Sqoop? What's the difference between Flume and Sqoop? hadoop hadoop

What's the difference between Flume and Sqoop?


From http://flume.apache.org/

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Flume helps to collect data from a variety of sources, like logs, jms, Directory etc.
Multiple flume agents can be configured to collect high volume of data.
It scales horizontally.

From http://sqoop.apache.org/

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Sqoop helps to move data between hadoop and other databases and it can transfer data in parallel for performance.


Both Sqoop and Flume, pull the data from the source and push it to the sink. The main difference is Flume is event driven, while Sqoop is not.


Flume:

  Flume is a framework for populating Hadoop with data. Agents are populated   throughout ones IT infrastructure – inside web servers, application servers  and mobile devices, for example – to collect data and integrate it into Hadoop.

Sqoop:

  Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such  as relational databases and data warehouses – into Hadoop. It allows users to   specify the target location inside of Hadoop and instruct Sqoop to move data   from Oracle,Teradata or other relational databases to the target. 

You can see the full Post