What is --direct mode in sqoop?
Just read the Sqoop documentation!
Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)
Some databases provides a direct mode for exports as well (...)
Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25
- Section 25 under MySQL
- Section 25 under Oracle data connector for Hadoop
- etc.
Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).
To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers.
sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct
Notes:
--direct
is only supported in mysql and postgresql.- Sqoop’s direct mode does not support imports of
BLOB
,CLOB
, orLONGVARBINARY
columns.
From Managing Big Data in Clusters and Cloud Storage
By default, Sqoop uses JDBC to connect to the database. However, depending on the database, there may be a faster, database-specific connector available, which you can use by using the --direct option.
So, you go with --direct option when you want to use a different database connector than the default.