HBase checkAndPut atomicity clarification HBase checkAndPut atomicity clarification hadoop hadoop

HBase checkAndPut atomicity clarification


Before trying to understand how checkAndPut behaves in case of a non-existing row, you should first understand how mutations work in HBase.

Mutations in HBase

A mutation in HBase is any write operation e.g. Put, Delete etc. Since HBase is a strongly consistent system and it provides atomicity guarantees for a single row (across column families), all the mutations for a particular row have to go through the same server. You should read more on the concept of regions and regionservers in HBase documentation to understand how HBase divides the responsibility of serving non-overlapping partitions of the row key space across a bunch of servers.

Whenever, a regionserver gets a mutation for a particular row, it acquires an in-memory write lock on the value of that row key. This essentially means four things:

  1. Since one row can be written by only one regionserver, there can never be more than one servers trying to write to and acquire lock for the same row.
  2. Since the lock is in memory, if the server crashes immediately after the lock acquistion, the lock is automatically released. The region's responsibility will then gracefully move to a new server, but your operation would have failed (not accounting for automatic retries on the client).
  3. Since the write lock is for the whole row, a mutation to column x will cause operations to column y of the same row to get blocked.
  4. Since the lock is on the value of the row key (the regionserver maintains a list of currently locked rows in memory), the row does not necessarily have to exist beforehand.

CheckAndPut is no different from a regular Put in terms of locking semantics. The only difference lies in the fact that it does an extra Get operation after locking the row key to verify the existing value of a column for that row key (it can be null, the row key might not exist at all yet). This is also the reason the row key for which the Put has been generated has to be the same as the row key for which the Get operation is generated. Otherwise, the in-memory locking semantics won't be able to provide consistency guarantees.This works well with HBase's other ACID guarantees, which are also provided only at the level of a single row.