A way to export the results from Pig to a database

While keeping in mind what orangeoctopus said (beware of DDOS...) have you had a look to DBStorage?

data = LOAD '...' AS (...);...STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'INSERT ...');

database export hadoop apache-pig

The main problem I see is that each reducer is effectively going to insert into the database around the same time.

If you don't think this will be an issue, I suggest you write a custom Storage method that uses JDBC (or something similar) to insert into the database directly and writing nothing out to HDFS.

If you are afraid of performing a DDOS attack on your own database, perhaps collecting the data on HDFS and performing a separate bulk load into mysql would be better.

database export hadoop apache-pig

I'm currently experimenting with an embedded pig application which loads results into mysql via PigServer.OpenIterator and a JDBC connection. It's worked very well in testing, but I haven't tried it at scale yet. This is similar to the custom storage method already suggested, but runs from a single point, so no accidental DDOS attack. You effectively end up paying the network transfer cost twice (cluster -> staging machine, staging machine -> DB server) if you don't run the load off the DB server (I personally prefer to run nothing except the DB itself off the DB server), but that's no different than the "write the file out and bulk load it" option.

CodeHunter

A way to export the results from Pig to a database

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last