What is --direct mode in sqoop? What is --direct mode in sqoop? hadoop hadoop

What is --direct mode in sqoop?


Just read the Sqoop documentation!

  • General principles are located here for imports and there for exports

Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)


Some databases provides a direct mode for exports as well (...)

Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25

Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).


To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers.

sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct

Notes:

  1. --direct is only supported in mysql and postgresql.
  2. Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns.


From Managing Big Data in Clusters and Cloud Storage

By default, Sqoop uses JDBC to connect to the database. However, depending on the database, there may be a faster, database-specific connector available, which you can use by using the --direct option.

So, you go with --direct option when you want to use a different database connector than the default.