Skip to content
Paweł Dziepak edited this page Oct 19, 2016 · 3 revisions

Counters are special kinds of cells which value can only be incremented, decremented, read and (with some limitations) deleted. In particular, once deleted, that counter cannot be used again. For example:

> UPDATE cf SET my_counter = my_counter + 6 WHERE pk = 0
> SELECT * FROM cf;
 pk | my_counter
----+------------
  0 |          6

(1 rows)
> UPDATE cf SET my_counter = my_counter - 1 WHERE pk = 0
> SELECT * FROM cf;
 pk | my_counter
----+------------
  0 |          5

(1 rows)
> DELETE my_counter FROM cf WHERE pk = 0;
> SELECT * FROM cf;
 pk | my_counter
----+------------

(0 rows)
> UPDATE cf SET my_counter = my_counter + 3 WHERE pk = 0
> SELECT * FROM cf;
 pk | my_counter
----+------------

(0 rows)

Counters representation

Counters are represented as sets of, so called, shards which are triples containing:

  • node id – uuid identifying the replica owning that shard
  • logical clock – incremented each time owning node modifies the shard value
  • current value – sum of increments and decrements done by the owning node

During each write operation one of the replicas is chosen as a leader. The leader reads its shard, increments logical clock, updates current value and then sends the new version of its shard to the other replicas.

Shards owned by the same node are merged (see below) so that each counter cell contains only one shard for each replica. Reading the actual counter value requires summing values of each node shards.

The number of shards in a counter cell is at most the number of nodes that at any point were responsible for the partition range to which that counter belong.

Legacy shards

Shards as described above are called global shards and are used by Cassandra 2.1 and newer. There are two older types of shards that were used previously and are now supported only for compatibility reasons.

  • remote shard – a shard that represents the value of a counter on another node, meaning of fields is the same as for global shards
  • local shard – a shard that represents a delta of counter value, the last field instead of being a counter value is a counter delta

Merging and reconciliation

Reconciliation of two counters requires merging all shards belonging to the same node. The rules of doing that are as follows:

  • global + global – shard with highest logical clock wins
  • global + local or remote – global shard wins
  • local + local – sum logical clock and shard values
  • local + remote – local shard wins
  • remote + remote – shard with highest logical clock wins

Since, support of deleting counters is limited so that once deleted they cannot be used again, during reconciliation tombstones win with live counter cell regardless of their timestamps.

Digest

Computing a digest of counter cells needs to be done so that types of the shards are not taken into account since they may vary on each replica if there are still local or remote shards in the cluster.

Writes

  1. Counter update starts with a client sending counter delta as a long (CQL3 bigint) to the coordinator.
  2. CQL3 or Thrift create a CounterMutation containing CounterUpdateCell which is just a delta.
  3. Coordinator chooses the leader of the counter update and sends it the mutation. The leader is always one of the replicas owning the partition modified counter belongs to.
  4. Now, the leader needs to transform counter deltas (CounterUpdateCell) into shards (stored in CounterCell). To do that it, for each counter the current value of the shards it owns is read and a global shard with that value modified by the delta is created.
  5. The mutation with newly created global shards in CounterCell is both used to update memtable on the leader as well as sent to the other nodes for replication.

Choosing leader

Choosing a replica which becomes a leader for a counter update is completely at the coordinator discretion. It is not a static role in any way and any concurrent update could be forwarded to a different leader. This means that all problems related to leader election are avoided.

The coordinator chooses the leader using the following algorithm:

  1. If the coordinator can be a leader it chooses itself.
  2. Otherwise, a random replica from the local DC is chosen.
  3. If there is no egligible node available in the local DC the replica closest to the coordinator (according to the snitch) is chosen.

Counter cache

Since updating a counter requires reading its local value beforehand Cassandra uses a counter cache which stores just local (i.e. owned by this node) logical clocks and counter values.

Reads

Querying counter values is much simpler than updating it. First part of the read operation is performed as for all other cell types. When counter cells from different sources are being reconciled their shards are merged. Once the final counter cell value is know and CounterCell is serialised, current values of all shards are summed up and the output of serialisation is a long integer.

Links

Clone this wiki locally