HBase regions automatic splitting using hbase.hregion.max.filesize HBase regions automatic splitting using hbase.hregion.max.filesize hadoop hadoop

HBase regions automatic splitting using hbase.hregion.max.filesize


@mpiffaretti, what you are seeing is very valid. I also got a little shock when I saw the regions sizes after an automatic split for the first time.

In HBase 0.94+, the default split policy is IncreasingToUpperBoundRegionSplitPolicy. The region size is decided by following the algorithm described below.

Split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then after two flushes (256MB) we will split which will make two regions that will split when their size is 2^3 * 128M*2 = 2048M. If one of these regions splits, then there are three regions and now the split size is 3^3 * 128M*2 = 6912M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that.

This is quite a nice strategy since you start to get a nice spread of regions over the region servers without having to wait until they reach the 10GB limit.

Alternatively, you would be better off pre-splitting your tables, since you want to make sure that you are getting the most out of the processing power of your cluster - if you have a single Region, all requests will go to the Region Server to which the region is assigned. Pre-splitting outs the control into your hands of how the regions are split over the row-key space.


Pr-splitting is better option. Hope your data is not continuously inserted into a single region and on reaching region limit, does splitting or compaction.

In that condition writes are not uniformly distributed and on compaction of table becomes a bottle neck for writing modules.

No of requests on Active region will be high.