Availability Zones: standardized levels of independecies. #539

josephineSei · 2024-03-25T07:55:23Z

Availability Zones are a concept in OpenStack.

As a user of an scs-conformal cloud I want to know what i can expect from AZs overall and what is dependent on the CSP.

Definition of Done:

Please refer to scs-0001-v1 for details.

Proposal has been written with name of the form scs-xxxx-v1-slug.md (only substitute slug)
Proposal has the fields status, type, track set
Proposal has been voted upon in the corresponding team
Status has been changed into Draft, file renamed: xxxx replaced by document number
If applicable: test script has been written (this item may be moved into a separate issue so long as the state is Draft)

The text was updated successfully, but these errors were encountered:

josephineSei · 2024-03-25T14:16:17Z

Why are we talking about AZs: AZs focus on redundancy and failure safety on IaaS-Level.

While redundancy at the lowest level could be just something like having replication in the storage backend, so there is no data loss in the case of a hardware failure, the requirements can be as hard as having a remote mirror of all data.

AI: We should at least document, what different levels of redundancy means and what failure safety different deployments can provide (@anjastrunk maybe the latter one would be something for gaia-x self-descriptions?)

Pre-Requirements

To also allow having small deployments or edge deployments, that usually only have 1 single AZ, we must not require a certain amount of AZs.
Redundancy and Failure safety in that case should be done on the next higher level (PaaS, CaaS, workload...) by the user.

We should rather define and check, for when AZs can be defined and used.

What can AZs be defined of / What can they separate

AZs are implemented in various ways for different resources, as those resources belong to different OpenStack services.
There are AZs for Nova (Compute), Cinder (Block Storage) and Neutron (Networking)

AZs are logical separations with a chance of physical separation.
AZs can be defined:

for Nodes with different Power Supplies
for nodes in different fire-zones, separated by strong fire safety mechanism (thinking about a whole deployment that burns down)
splitting AZs rack-based (having the top-of-rack switch or powersupply as single point of failure)
having a chance for planned maintenance (e.g. upgrade one AZ after another, and telling customers to "just" switch AZs) -> maybe with rolling upgrades that is not common anymore
to distinguish between backends (KVM vs another hypervisor). For storage there is always the possibility to use different volume types, so this is mainly applicable to Compute
for security reasons: either one AZ for one customer or one AZ for one physical node for tenant separation on hypervisors (there are other options and maybe better ways to accomplish this)

Problems:

users need to understand whats the difference between two AZs, but they do not have the knowledge of the underlying infrastructure, the capacity of the AZs or how full (of VMs or Volumes) one AZ is right now. So in many cases user might just guess or take some default.

Restrictions:

in Nova a physical host can only by mapped to one AZ (or maybe the compute service running on it)
the Nova config option cross_az_attach allows or disallows attachment of volumes from other AZs.
in Cinder volume services are mapped to AZs
most Cinder backends already have built-in redundancy which makes having Cinder AZs dispensable
in Cinder and Neutron AZs are hardcoded in config ( https://docs.openstack.org/neutron/latest/admin/config-az.html#availability-zone-of-agents and https://docs.openstack.org/cinder/latest/configuration/block-storage/config-options.html), if not used Cinder automatically defaults to nova

A good but a bit outdated overview was presented at the Summit in 2018 ( https://www.youtube.com/watch?v=a5332_Ew9JA )

Proposal:

The SCS should not require any AZs
The SCS should gather information from all CSPs about their usage of AZs
A DR should be written to define what types of AZ are recognized by the SCS (What do we want to achieve when having AZs)
The DR should also include which services of OpenStack having AZs are recognized (e.g. There is little value in having Volume AZs when most backends already provide redundancy)
Levels of Redundancy / Failure Safety should be defined at a very high-level point of view for IaaS and maybe other Layers too

josephineSei · 2024-03-26T06:52:06Z

I created a hedgedoc for CSPs to talk about their AZ usage:
https://input.scs.community/Availability-Zone-Usage#

josephineSei · 2024-04-08T08:35:18Z

Up until now, there was not much input - so I put it on the agenda again for the next IaaS call

josephineSei · 2024-05-22T09:39:07Z

A few CSPs answered the questions in the hedgedoc, so we can go on with the work on AZs.
There was also a proposal as what to use in the hedgedoc.

The problem here is, when in a deployment AZs are used differently those deployment might not be changed, because change the AZ-architecture is quite fundamental. So all other deployments would be automatically rendered scs-incompatible.

Another option is to use the failsafe levels that will be defined in #527, this would be more vague - we should discuss, whether we want this or not.

josephineSei self-assigned this Mar 25, 2024

josephineSei mentioned this issue Mar 25, 2024

[EPIC] IaaS standards #285

Open

60 tasks

anjastrunk added standards Issues / ADR / pull requests relevant for standardization & certification SCS-VP10 Related to tender lot SCS-VP10 labels Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Availability Zones: standardized levels of independecies. #539

Availability Zones: standardized levels of independecies. #539

josephineSei commented Mar 25, 2024

josephineSei commented Mar 25, 2024

josephineSei commented Mar 26, 2024

josephineSei commented Apr 8, 2024

josephineSei commented May 22, 2024

Availability Zones: standardized levels of independecies. #539

Availability Zones: standardized levels of independecies. #539

Comments

josephineSei commented Mar 25, 2024

Definition of Done:

josephineSei commented Mar 25, 2024

Pre-Requirements

What can AZs be defined of / What can they separate

Proposal:

josephineSei commented Mar 26, 2024

josephineSei commented Apr 8, 2024

josephineSei commented May 22, 2024