HBase checkAndPut atomicity clarification
Before trying to understand how checkAndPut
behaves in case of a non-existing row, you should first understand how mutations
work in HBase.
Mutations in HBase
A mutation in HBase is any write operation e.g. Put
, Delete
etc. Since HBase is a strongly consistent system and it provides atomicity guarantees for a single row (across column families), all the mutations for a particular row have to go through the same server. You should read more on the concept of regions and regionservers in HBase documentation to understand how HBase divides the responsibility of serving non-overlapping partitions of the row key space across a bunch of servers.
Whenever, a regionserver gets a mutation for a particular row, it acquires an in-memory write lock
on the value of that row key. This essentially means four things:
- Since one row can be written by only one regionserver, there can never be more than one servers trying to write to and acquire lock for the same row.
- Since the lock is in memory, if the server crashes immediately after the lock acquistion, the lock is automatically released. The region's responsibility will then gracefully move to a new server, but your operation would have failed (not accounting for automatic retries on the client).
- Since the write lock is for the whole row, a mutation to column
x
will cause operations to columny
of the same row to get blocked. - Since the lock is on the value of the row key (the regionserver maintains a list of currently locked rows in memory), the row does not necessarily have to exist beforehand.
CheckAndPut
is no different from a regular Put
in terms of locking semantics. The only difference lies in the fact that it does an extra Get
operation after locking the row key to verify the existing value of a column for that row key (it can be null, the row key might not exist at all yet). This is also the reason the row key for which the Put
has been generated has to be the same as the row key for which the Get
operation is generated. Otherwise, the in-memory locking semantics won't be able to provide consistency guarantees.This works well with HBase's other ACID guarantees, which are also provided only at the level of a single row.