Skip to content
vladzcloudius edited this page Apr 20, 2017 · 16 revisions

Abstract

Hinted Handoff is a feature that allows replaying failed writes. The mutation and the destination replica are saved in a log and replayed later according to the feature configuration.

Hinted Handoff Configuration Parameters

  • hinted_handoff_enabled: Enabled or disables the Hinted Handoff feature completely.
  • hinted_handoff_disabled_datacenters: Disable the feature for the given DCs.
  • max_hint_window_in_ms: Don't generate hints if the destination Node has been down for more than this value. The hints should resume once the Node is seen up (and went down).
  • hinted_handoff_throttle_in_kb: Maximum throttle in KB per shard.
  • hints_directory: Directory where scylla will store hints. By default $SCYLLA_HOME/data/hints
  • hints_flush_period_in_ms: How often hints are flushed from internal buffers to disk.
  • max_hints_file_size_in_mb: Maximum hints file size.
  • hints_compression: Compression to apply to hints files. By default, hints files are stored uncompressed.
  • max_hints_in_progress: maximum amount of pending hint writes per shard. By default 128 (as in Origin). Origin also limits the maximum amount of pending hint writes to a specific destination by a single request.

Hints generation

Once the unreachable destination Node is detected we create a hints_queue for this Node. The queue is specified by a hints_descriptor:

  • Destination Node UUID
  • Source shard ID
  • Timestamp
  • Messaging service version

Hints are appended to the hints_queue until:

  • The destination Node goes UP:
    • The hints_queue is flushed to the disk and destroyed.
    • The hints dispatcher context starts sending hints collected so far to the destination Node.
  • The current Node goes DOWN for some reason (shutdown, decommission, etc.).
  • hints_flush_period_in_ms has passed:
    • The current content of the hints_queue is stored to the disk:
      • If there isn't any hints file for the current destination Node - create one.
      • If there is a hints file already and it hasn't reached the max_hints_file_size_in_mb - append the new data to it.
        • If hints file reaches the max_hints_file_size_in_mb size - close it and create a new hint file for this destination.

The info from the hints_descriptor is going to be encoded into the hint file's name:

  • The destination Node UUID
  • Source shard ID
  • Timestamp
  • Messaging service version

Hints sending

  • Hints are sent using a new RPC verb:
    • Each mutation is sent in a separate message.
    • Each message has ID which is returned back to the sender in a reply (see below).
    • Message also contains the mutation and the destination Node UUID.
    • In case of success the reply is sent back to the sender (similarly to the regular WRITE_WITH_TIMEOUT verb).
    • In case of a failure nothing is sent back letting the sender to timeout.
  • Hints sending is triggered by the following events:
    • Timer: every X seconds (Origin does it every 10s).
    • Local Node is decommissioned - in this case we forward hints to some other Node so that it would send them to the destination later on.

Hint message handling (on a destination Node)

  • If message destination Node encoded in the message is the current Node:
    • If the mutation may be applied locally - apply the mutation.
    • Otherwise (the topology must have changed and the current Node isn't a valid replica for the mutation any more) redirect this hint to all valid replicas for the mutation.
  • If the destination Node is not the local Node (the message must have been sent by the decommissioning Node) store the hint locally for later delivery.
Clone this wiki locally