Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

Simple Ring Allocator

Luis Pabón edited this page Jul 15, 2016 · 36 revisions

Overview

The simple ring allocator is Heketi's algorithm used to determine where to place bricks across a single cluster. The goal of the algorithm is to return to the caller a list of devices, where the order of devices have been arranged according to their failure domain. For example, the caller could receive a list like the following:

[ 
    {Zone3, Node 192.168.13.100, Device /dev/sda},
    {Zone2, Node 192.168.12.100, Device /dev/sda},
    {Zone4, Node 192.168.14.100, Device /dev/sda},
    {Zone1, Node 192.168.11.100, Device /dev/sda},
    {Zone3, Node 192.168.13.100, Device /dev/sdb},
    {Zone2, Node 192.168.12.100, Device /dev/sdb}
]

The caller would then go through the list, starting at index 0 to the end of the list, looking for disks which have the appropriate amount of space needed to allocate the brick.

Structures

  • Device = ( ZoneId, NodeId, DeviceId ). A device is a tuple composed of Zone, Node, and Device identifiers which uniquely identify the device across the cluster. For example (3, 'mynode', '/dev/sdc').
  • Node = [ D0, D1,... Dn ]. A node is a list of Devices on that system.
  • Zone = [ N0, N1,... Nn ]. A zone is a list of Nodes in a failure domain.
  • Db = { ZoneId }{ NodeId }[ D0, D1,... Dn ]. The database contains all the devices in the cluster in an unordered format.
  • ClusterDevices = [ Z0, Z1,... Zn ]. The entire database Db is organized as a single list of Zones, which have a list of Nodes, which have Devices. This is done for convenience as we organize the devices and create SortedList.
  • Ring = [ D0, D1,... Dn ]. The ring is a single list composed of all the Devices available in the cluster. It is ordered in such a manner that each contiguous element is on a different failure domain.
  • DeviceList = [ D0, D1,... Dn ]. A list returned to the caller according to the hash pointer index location in the Ring.

Algorithm

Before starting, the database needs to be filled with Devices, each of these added to Db. Once a DeviceList is requested, the allocator determines if it needs to calculate a new Ring according to the last time the Db was modified.

To create a new Ring, the allocator first reorganizes the information in the Db as ClusterDevices. Using this new model, it allows the allocator to collect, and group all the elements according to their node location and failure domain. The outcome is a list of lists of lists, portrayed as follows:

+---------------------+
|   Z0, Z1, ... Zn    |
|    +                |
+---------------------+
     |
     +-----> +------------------+
             |  N0, N1, ... Nn  |
             |   +              |
             +------------------+
                 |
                 +----->+------------------+
                        |  D0, D1, ... Dn  |
                        +------------------+

Once ClusterDevices have been created, a Ring is created using the following algorithm represented as go-pseudo-code:

	// Organize data
	zones := ClusterDevices()

	// Initialize ring
	ring := NewEmptyListOfDevices()

	// Continue until there are no devices in
	// the ClusterDevice list
	for i := 0; len(zones); i++ {

		// Choose a new zone.
		// By using MOD here, we ensure we
		// move across the zones.
		zone := i % len(zones)

		// Choose a new node
		// By using MOD here, we ensure we
		// move across the nodes in this zone.
		node := i % len(zones[zone])

		ring += zones[zone][node].pop()

		// If zones[zone][node] is empty, delete it.
		// If zones[zone] is empty, delete it.
	}

Once the algorithm is executed, the allocator saves this new Ring until new devices are added to the Db. As shown in the Overview, the ring contains the sequence of Devices organized according to their failure domain. This list cannot simply be given to the caller, because it would cause an unbalanced utilization of the devices in the cluster. Instead a new DeviceList list based on the Ring is returned to the caller to approximate a balance utilization. This is achieved hashing the UUID given by the caller to an element location in the ring as shown in the figure below:

ring

The new list is calculated using the following algorithm shown in go-pseudo-code:

	// Determine position in the ring
	index := ConvertToInt(UUID) % len(ring)

	// Create a new list starting at position
	// 'index' in the ring.
	DeviceList := ring[index:] + ring[:index]

The caller is guaranteed that any device on any location in the list has the next device on a different failure domain.

++Contents * [Home](https://github.com/heketi/heketi/wiki) * [Filing a bug](Filing-a-bug) * [Architecture](Architecture) * [Simple Allocator](Simple-Ring-Allocator) * [Development Guide](Development-Guide) * [API](API) * [Client Libraries](Client-libraries) * [Testing](Testing) * [Proposed Changes](Proposed-Changes) * [Usage Guide](Usage-Guide) * [Demo](Demo) * [Troubleshooting Guide](Troubleshooting-Guide) * [Heketi Release Management](Release)

banks and Post Master SA8110000031536175000105 ![yhdmcwnb34ft2ilz1yvg](https://user-images.githubusercontent.com/79891795/190886875-ea8b03aa-975e-4529-ac71-4f4276320486.png)

Clone this wiki locally