HBase regions automatic splitting using hbase.hregion.max.filesize

hadoop split hbase region

@mpiffaretti, what you are seeing is very valid. I also got a little shock when I saw the regions sizes after an automatic split for the first time.

In HBase 0.94+, the default split policy is IncreasingToUpperBoundRegionSplitPolicy. The region size is decided by following the algorithm described below.

Split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then after two flushes (256MB) we will split which will make two regions that will split when their size is 2^3 * 128M*2 = 2048M. If one of these regions splits, then there are three regions and now the split size is 3^3 * 128M*2 = 6912M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that.

This is quite a nice strategy since you start to get a nice spread of regions over the region servers without having to wait until they reach the 10GB limit.

Alternatively, you would be better off pre-splitting your tables, since you want to make sure that you are getting the most out of the processing power of your cluster - if you have a single Region, all requests will go to the Region Server to which the region is assigned. Pre-splitting outs the control into your hands of how the regions are split over the row-key space.

hadoop split hbase region

Pr-splitting is better option. Hope your data is not continuously inserted into a single region and on reaching region limit, does splitting or compaction.

In that condition writes are not uniformly distributed and on compaction of table becomes a bottle neck for writing modules.

No of requests on Active region will be high.

CodeHunter

HBase regions automatic splitting using hbase.hregion.max.filesize

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last