Skip to content

Low level DocDB key encoding format

Mikhail Bautin edited this page Nov 5, 2019 · 5 revisions

This document describes the low-level format of key encoding in DocDB. By "key encoding" we mean the way we turn logical keys to sequences of bytes that are used as keys in RocksDB. An encoded key could include primary key columns (hash-based and range-based), the computed hash value for hash-based columns, and an internal column id.

Value types

The "value type" is an enum that defines the types of individual components of an encoded key. It is defined in value_type.h.

Primitive values

A primitive value (primitive_value.h) represents a single component of a primary key or a number of other simple data types that could be used during key encoding, which could be e.g. integer, a string, an internal 16-bit hash value, a UUID, etc. An encoded primitive value always starts with a value type byte, followed by a data type specific encoding that is binary-sortable, i.e. its lexicographical comparison order as a byte sequence matches the correct logical comparison order of the data type.

DocKey vs SubDocKey

A DocKey corresponds to the primary key in a SQL table, including some internal information such as the computed hash value corresponding to the hash-based components of that primary key. Also, if the schema of a certain table has range components, e.g. (h1, h2, r1, r2), then prefixes obtained by trimming the list of range components could logically be considered valid DocKeys as well -- in this example (h1, h2) and (h1, h2, r1).

A SubDocKey consists of a DocKey followed by a sequence of "subkeys". These subkeys represent a path from the "root" of a particular row in a table to the individual piece of data being represented. The first subkey is typically a column id, and it could be followed by e.g. a key in a map (if the column's data type is a map).

DocKey encoding

A DocKey is encoded slightly differently based on whether it has hash components and whether it is for a YSQL system table or not.

First, for YSQL system tables (which are stored in the master Raft group) the key starts with the following components:

  • kTableId byte (y)
  • YSQL system table UUID (16 bytes)

For all other tables the above part is not present.

Then, if there are hash components in the primary key:

  • kUInt16Hash byte (G)
  • 2 bytes (big-endian) with the
  • Hash components, encoded as primitive values
  • kGroupEnd (!) to indicate the end of hash components. This is not present if there are no hash components.

Then, for range components:

  • Range components encoded as primitive values
  • kGroupEnd (!) to indicate the end of range components. This is present even if there are no range components.

Examples

Empty key

An empty key is encoded as one byte, ! (kGroupEnd). There is no table id, no hash part, and the kGroupEnd indicates the end of the empty range component.