HBase checkAndPut atomicity clarification

hadoop concurrency hbase

Before trying to understand how checkAndPut behaves in case of a non-existing row, you should first understand how mutations work in HBase.

Mutations in HBase

A mutation in HBase is any write operation e.g. Put, Delete etc. Since HBase is a strongly consistent system and it provides atomicity guarantees for a single row (across column families), all the mutations for a particular row have to go through the same server. You should read more on the concept of regions and regionservers in HBase documentation to understand how HBase divides the responsibility of serving non-overlapping partitions of the row key space across a bunch of servers.

Whenever, a regionserver gets a mutation for a particular row, it acquires an in-memory write lock on the value of that row key. This essentially means four things:

Since one row can be written by only one regionserver, there can never be more than one servers trying to write to and acquire lock for the same row.
Since the lock is in memory, if the server crashes immediately after the lock acquistion, the lock is automatically released. The region's responsibility will then gracefully move to a new server, but your operation would have failed (not accounting for automatic retries on the client).
Since the write lock is for the whole row, a mutation to column x will cause operations to column y of the same row to get blocked.
Since the lock is on the value of the row key (the regionserver maintains a list of currently locked rows in memory), the row does not necessarily have to exist beforehand.

CheckAndPut is no different from a regular Put in terms of locking semantics. The only difference lies in the fact that it does an extra Get operation after locking the row key to verify the existing value of a column for that row key (it can be null, the row key might not exist at all yet). This is also the reason the row key for which the Put has been generated has to be the same as the row key for which the Get operation is generated. Otherwise, the in-memory locking semantics won't be able to provide consistency guarantees.This works well with HBase's other ACID guarantees, which are also provided only at the level of a single row.

CodeHunter

HBase checkAndPut atomicity clarification

Mutations in HBase

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last