HBase MemStore and Garbage Collection

memory-management hadoop hbase

You are right about Hbase Memstore. In general when something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size*, it is flushed to disk into a store file (everything is also written immediately to a log file for durability).

*From Global perspective, HBase uses by default 40% of the heap (see property hbase.regionserver.global.memstore.upperLimit) for all memstores of all regions of all column families of all tables. If this limit is reached, it starts flushing some memstores until the memory used by memstores is below at least 35% of heap (lowerLimit property). This is adjustable but you would need to have perfect calculation to have this change.

Yes GC does impact on memstore and you can actually modify this behavior by using Memstore-local allocation buffer. I would suggest you to read the 3 part article on "Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers" as below :http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/

memory-management hadoop hbase

The problem is that java as a technology has a problem to handle server which creates and delete a lot of objects and, in the same time, should respond to all requests in the timely moment. The root cause is a garbage collector which should, sometimes, do so called "stop the world" and clean up the memory. In large heaps it can cause delay of several seconds.
Now let see why it happens to HBase and why it has to respond in timely fasion.
Memstore is a cache of the region data. If data is highly volotile a lot of objects are created/deleted. As a result there is a lot of GC (Garbage collector ) pressuer.
HBase, as any real time system working with big data sets tends to cache as much as possible, and, thereof its MemStores are big.
HBase Region Servers has to communicate with ZooKeeper in timely fashion to let is know they are alive and avoid migration. Long GC pacuse can prevent it.
What cloudera did - implemented own memory management mechanism special for MemStore to avoid GC pauses.Larse in his book describes how to tune GC to make it work better with Region Server.
http://books.google.co.il/books?id=Ytbs4fLHDakC&pg=PA419&lpg=PA419&dq=MemStore+garbage+collector+HBASE&source=bl&ots=b-Sk-HV22E&sig=tFddqrJtlE_nIUI3VDMEyHdgx6o&hl=iw&sa=X&ei=79CyT82BIM_48QO_26ykCQ&ved=0CHUQ6AEwCQ#v=onepage&q=MemStore%20garbage%20collector%20HBASE&f=false

CodeHunter

HBase MemStore and Garbage Collection

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last