Multiple rows insertion in HBase using MapReduce

hadoop mapreduce hbase

I prefer second option where batching is natural(no need for list of puts) for mapreduce.... to have deep insight please see my second point

1) Your first option List<Put> is generally used for Standalone Hbase Java client. Internally it is controlled by hbase.client.write.buffer like below in one of your config xmls

<property>         <name>hbase.client.write.buffer</name>         <value>20971520</value> // around 2 mb i guess </property>

which has default value say 2mb size. once you buffer is filled then it will flush all puts to actually insert in to your table. which is same way as BufferedMutator as explained in #2

2) Regarding second option, if you see TableOutputFormat documentation

org.apache.hadoop.hbase.mapreduceClass TableOutputFormat<KEY>java.lang.Objectorg.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation>org.apache.hadoop.hbase.mapreduce.TableOutputFormat<KEY>All Implemented Interfaces:org.apache.hadoop.conf.Configurable@InterfaceAudience.Public@InterfaceStability.Stablepublic class TableOutputFormat<KEY>extends org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation>implements org.apache.hadoop.conf.ConfigurableConvert Map/Reduce output and write it to an HBase table. The KEY is ignored

while the output value must be either a Put or a Delete instance.

-- Other way of seeing this through code is like below.

/**     * Writes a key/value pair into the table.     *     * @param key  The key.     * @param value  The value.     * @throws IOException When writing fails.     * @see RecordWriter#write(Object, Object)     */    @Override    public void write(KEY key, Mutation value)    throws IOException {      if (!(value instanceof Put) && !(value instanceof Delete)) {        throw new IOException("Pass a Delete or a Put");      }      mutator.mutate(value);    }  }

conclusion : context.write(rowkey,putlist) It is not possible with API.

However, BufferedMutator ( from mutator.mutate in above code) says

Map/reduce jobs benefit from batching, but have no natural flush point. {@code BufferedMutator} receives the puts from the M/R job and will batch puts based on some heuristic, such as the accumulated size of the puts, and submit batches of puts asynchronously so that the M/R logic can continue without interruption.

so your batching is natural(with BufferedMutator) as aforementioned

CodeHunter

Multiple rows insertion in HBase using MapReduce

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last