Import data on HDFS to SQL Server or export data on HDFS to SQL Server

sql-server hadoop cloudera sqoop

Both of your options use the same method: Apache Sqoop's Export utility. Using the licensed Microsoft connector/driver jar should expectedly yield more performance for the task than using a generic connector offered by Apache Sqoop.

In terms of maintenance, there should be none once you have it working fine. So long as the version of SQL Server in use is supported by the driver jar, it should continue to work as normally expected of it.

In terms of configuration, you may initially have to manually tune to find the best -m value for parallelism of your Export MapReduce job launched by the export tool. Using a too high value would cause problems on the DB side, while using a too low value would not give you ideal performance. Some trial and error is required here to arrive at the right -m value, along with knowledge of the load periods of your DB, in order to set the parallelism right.

The Apache Sqoop (v1) doc page for users of the export tool also lists down a set of common reasons for the failure of the export job. You may want to view those here.

On the MapReduce side, you may also want to dedicate a defined scheduler pool or queue for such external-writing jobs as they may be business critical, and schedulers like FairScheduler and CapacityScheduler help define SLA guarantees on each pool or queue such that the jobs get adequate resources to run when they're launched.

CodeHunter

Import data on HDFS to SQL Server or export data on HDFS to SQL Server

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last