Hinted Handoff
vladzcloudius edited this page Apr 20, 2017
·
16 revisions
Hinted Handoff is a feature that allows replaying failed writes. The mutation and the destination replica are saved in a log and replayed later according to the feature configuration.
- hinted_handoff_enabled: Enabled or disables the Hinted Handoff feature completely.
- hinted_handoff_disabled_datacenters: Disable the feature for the given DCs.
- max_hint_window_in_ms: Don't generate hints if the destination Node has been down for more than this value. The hints should resume once the Node is seen up (and went down).
- hinted_handoff_throttle_in_kb: Maximum throttle in KB per shard.
-
hints_directory: Directory where scylla will store hints. By default
$SCYLLA_HOME/data/hints
- hints_flush_period_in_ms: How often hints are flushed from internal buffers to disk.
- max_hints_file_size_in_mb: Maximum hints file size.
- hints_compression: Compression to apply to hints files. By default, hints files are stored uncompressed.
- max_hints_in_progress: maximum amount of pending hint writes per shard. By default 128 (as in Origin). Origin also limits the maximum amount of pending hint writes to a specific destination by a single request.
Once the unreachable destination Node is detected we create a hints_queue for this Node. The queue is specified by a hints_descriptor:
- Destination Node UUID
- Source shard ID
- Timestamp
- Messaging service version
Hints are appended to the hints_queue until:
- The destination Node goes UP:
- The hints_queue is flushed to the disk and destroyed.
- The hints dispatcher context starts sending hints collected so far to the destination Node.
- The current Node goes DOWN for some reason (shutdown, decommission, etc.).
-
hints_flush_period_in_ms has passed:
- The current content of the hints_queue is stored to the disk:
- If there isn't any hints file for the current destination Node - create one.
- If there is a hints file already and it hasn't reached the max_hints_file_size_in_mb - append the new data to it.
- If hints file reaches the max_hints_file_size_in_mb size - close it and create a new hint file for this destination.
- The current content of the hints_queue is stored to the disk:
The info from the hints_descriptor is going to be encoded into the hint file's name:
- The destination Node UUID
- Source shard ID
- Timestamp
- Messaging service version
- Hints are sent using a new RPC verb:
- Each mutation is sent in a separate message.
- Each message has ID which is returned back to the sender in a reply (see below).
- Message also contains the mutation and the destination Node UUID.
- In case of success the reply is sent back to the sender (similarly to the regular WRITE_WITH_TIMEOUT verb).
- In case of a failure nothing is sent back letting the sender to timeout.
- Hints sending is triggered by the following events:
- Timer: every X seconds (Origin does it every 10s).
- Local Node is decommissioned - in this case we forward hints to some other Node so that it would send them to the destination later on.
- If message destination Node encoded in the message is the current Node:
- If the mutation may be applied locally - apply the mutation.
- Otherwise (the topology must have changed and the current Node isn't a valid replica for the mutation any more) redirect this hint to all valid replicas for the mutation.
- If the destination Node is not the local Node (the message must have been sent by the decommissioning Node) store the hint locally for later delivery.