Can Sqoop be used to perform joins on the IMPORT? Can Sqoop be used to perform joins on the IMPORT? hadoop hadoop

Can Sqoop be used to perform joins on the IMPORT?


It is possible to do joins in sqoop imports.

From an architecture point of view, It depends on your usecase, sqoop is mainly a utility for fast imports/exports. All the etl can be done through spark/pig/hive/impala.

Although it is doable, I would recommend not to, since it will increase your job's time efficiency plus it will put load on your source for computing joins/aggregations as well also sqoop was primarily designed to be an ingestion tool for structured sources.


It depends on the infrastructure of your data pipeline, if you are using Spark for some other purpose then it will be better to use the same Spark for importing the data as well. Sqoop support join and will be sufficient if you only need to import data and nothing else. Hope this answers your query.


You can use:

  • a view in the DBMS where reading from using sqoop eval to set parameters in DB there, optionally.
  • freeform SQL for sqoop wher JOIN defined

However, views with JOINs cannot be used for incremental imports.

The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.