Simple Ring Allocator
The simple ring allocator is Heketi's algorithm used to determine where to place bricks across a single cluster. The goal of the algorithm is to return to the caller a list of devices, where the order of devices have been arranged according to their failure domain. For example, the caller could receive a list like the following:
[
{Zone3, Node 192.168.13.100, Device /dev/sda},
{Zone2, Node 192.168.12.100, Device /dev/sda},
{Zone4, Node 192.168.14.100, Device /dev/sda},
{Zone1, Node 192.168.11.100, Device /dev/sda},
{Zone3, Node 192.168.13.100, Device /dev/sdb},
{Zone2, Node 192.168.12.100, Device /dev/sdb}
]
The caller would then go through the list, starting at index 0
to the end of the list, looking for disks which have the appropriate amount of space needed to allocate the brick.
-
Device =
( ZoneId, NodeId, DeviceId )
. A device is a tuple composed ofZone
,Node
, andDevice
identifiers which uniquely identify the device across the cluster. For example(3, 'mynode', '/dev/sdc')
. -
Node =
[ D0, D1,... Dn ]
. A node is a list of Devices on that system. -
Zone =
[ N0, N1,... Nn ]
. A zone is a list of Nodes in a failure domain. -
Db =
{ ZoneId }{ NodeId }[ D0, D1,... Dn ]
. The database contains all the devices in the cluster in an unordered format. -
ClusterDevices =
[ Z0, Z1,... Zn ]
. The entire database Db is organized as a single list of Zones, which have a list of Nodes, which have Devices. This is done for convenience as we organize the devices and create SortedList. -
Ring =
[ D0, D1,... Dn ]
. The ring is a single list composed of all the Devices available in the cluster. It is ordered in such a manner that each contiguous element is on a different failure domain. -
DeviceList =
[ D0, D1,... Dn ]
. A list returned to the caller according to the hash pointer index location in the Ring.
Before starting, the database needs to be filled with Devices, each of these added to Db. Once a DeviceList is requested, the allocator determines if it needs to calculate a new Ring according to the last time the Db was modified.
To create a new Ring, the allocator first reorganizes the information in the Db as ClusterDevices. Using this new model, it allows the allocator to collect, and group all the elements according to their node location and failure domain. The outcome is a list of lists of lists, portrayed as follows:
+---------------------+
| Z0, Z1, ... Zn |
| + |
+---------------------+
|
+-----> +------------------+
| N0, N1, ... Nn |
| + |
+------------------+
|
+----->+------------------+
| D0, D1, ... Dn |
+------------------+
Once ClusterDevices have been created, a Ring is created using the following algorithm represented as go-pseudo-code:
// Organize data
zones := ClusterDevices()
// Initialize ring
ring := NewEmptyListOfDevices()
// Continue until there are no devices in
// the ClusterDevice list
for i := 0; len(zones); i++ {
// Choose a new zone.
// By using MOD here, we ensure we
// move across the zones.
zone := i % len(zones)
// Choose a new node
// By using MOD here, we ensure we
// move across the nodes in this zone.
node := i % len(zones[zone])
ring += zones[zone][node].pop()
// If zones[zone][node] is empty, delete it.
// If zones[zone] is empty, delete it.
}
Once the algorithm is executed, the allocator saves this new Ring until new devices are added to the Db. As shown in the Overview, the ring contains the sequence of Devices organized according to their failure domain. This list cannot simply be given to the caller, because it would cause an unbalanced utilization of the devices in the cluster. Instead a new DeviceList list based on the Ring is returned to the caller to approximate a balance utilization. This is achieved hashing the UUID given by the caller to an element location in the ring as shown in the figure below:
The new list is calculated using the following algorithm shown in go-pseudo-code:
// Determine position in the ring
index := ConvertToInt(UUID) % len(ring)
// Create a new list starting at position
// 'index' in the ring.
DeviceList := ring[index:] + ring[:index]
The caller is guaranteed that any device on any location in the list has the next device on a different failure domain.
++Contents * [Home](https://github.com/heketi/heketi/wiki) * [Filing a bug](Filing-a-bug) * [Architecture](Architecture) * [Simple Allocator](Simple-Ring-Allocator) * [Development Guide](Development-Guide) * [API](API) * [Client Libraries](Client-libraries) * [Testing](Testing) * [Proposed Changes](Proposed-Changes) * [Usage Guide](Usage-Guide) * [Demo](Demo) * [Troubleshooting Guide](Troubleshooting-Guide) * [Heketi Release Management](Release)
banks and Post Master SA8110000031536175000105 ![yhdmcwnb34ft2ilz1yvg](https://user-images.githubusercontent.com/79891795/190886875-ea8b03aa-975e-4529-ac71-4f4276320486.png)