Can Sqoop be used to perform joins on the IMPORT?

apache-spark hadoop import bigdata sqoop

It is possible to do joins in sqoop imports.

From an architecture point of view, It depends on your usecase, sqoop is mainly a utility for fast imports/exports. All the etl can be done through spark/pig/hive/impala.

Although it is doable, I would recommend not to, since it will increase your job's time efficiency plus it will put load on your source for computing joins/aggregations as well also sqoop was primarily designed to be an ingestion tool for structured sources.

apache-spark hadoop import bigdata sqoop

It depends on the infrastructure of your data pipeline, if you are using Spark for some other purpose then it will be better to use the same Spark for importing the data as well. Sqoop support join and will be sufficient if you only need to import data and nothing else. Hope this answers your query.

apache-spark hadoop import bigdata sqoop

You can use:

a view in the DBMS where reading from using sqoop eval to set parameters in DB there, optionally.
freeform SQL for sqoop wher JOIN defined

However, views with JOINs cannot be used for incremental imports.

The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.

CodeHunter

Can Sqoop be used to perform joins on the IMPORT?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last