Hbase CopyTable inside Java Hbase CopyTable inside Java hadoop hadoop

Hbase CopyTable inside Java


Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?

First thing is snapshot is better way than CopyTable.

  • HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.

Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.

Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables

Snapshot Internals


Another Map reduce way than CopyTable :

You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).

This increased performance for standalone inserts in to hbase client you can try this in mapreduce way

public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {        try {            final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));            table.put(puts);            LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");        } catch (final Throwable e) {            e.printStackTrace();        } finally {            LOG.info("Processed ---> " + puts.size());            if (puts != null) {                puts.clear();            }        }    }

along with that you can also consider below...

Enable write buffer to large value than default

1) table.setAutoFlush(false)

2) Set buffer size

<property>         <name>hbase.client.write.buffer</name>         <value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040 </property>             OR    void setWriteBufferSize(long writeBufferSize) throws IOException

The buffer is only ever flushed on two occasions:
Explicit flush
Use the flushCommits() call to send the data to the servers for permanent storage.

Implicit flush
This is triggered when you call put() or setWriteBufferSize(). Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.

In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().