Skip to content

Major data structures

Botond Dénes edited this page Nov 3, 2017 · 2 revisions

Cluster

Not really a data structure, a cluster is a set of cooperating scylla nodes. Logically, a cluster contains all of the data. Physically, the data is spread across shards.

Cluster components

  1. Nodes
  2. Keyspaces: each cluster manages a number of keyspaces (which are themselves containers of tables)

Node

A node has a 1:1 correspondence with a server. It has one IP address by which it is known. It is a unit of failure -- a node can fail and recover, be taken offline and put back offline. A node contains shards and physical storage.

Node components

  1. Shards: each core in a node is represented by a shard

Shard

A shard has a 1:1 correspondence with a processor core. Each shard is responsible for a number of vnodes (each representing a fraction of possible keys stored in the cluster)

Shard components

  1. Thrift server: listens for connections on the thrift protocol
  2. Native server: listens for connections on the native protocol
  3. Metadata: replicas of all metadata in the system about keyspaces and tables (their names and structures)
  4. Memtables: pending dirty data in the portion of the key space (set of vnodes) belonging to the shard.
  5. Ring: description of the cluster topology: list of nodes, vnode<->node relationship, vnode<->key space relationship
  6. Commitlog: disk files containing a log of data in memtables
  7. Sstables

Memtables

Clone this wiki locally