Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability Zones: standardized levels of independecies. #539

Open
5 tasks
josephineSei opened this issue Mar 25, 2024 · 4 comments
Open
5 tasks

Availability Zones: standardized levels of independecies. #539

josephineSei opened this issue Mar 25, 2024 · 4 comments
Assignees
Labels
SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification

Comments

@josephineSei
Copy link
Contributor

Availability Zones are a concept in OpenStack.

As a user of an scs-conformal cloud I want to know what i can expect from AZs overall and what is dependent on the CSP.

Definition of Done:

Please refer to scs-0001-v1 for details.

  • Proposal has been written with name of the form scs-xxxx-v1-slug.md (only substitute slug)
  • Proposal has the fields status, type, track set
  • Proposal has been voted upon in the corresponding team
  • Status has been changed into Draft, file renamed: xxxx replaced by document number
  • If applicable: test script has been written (this item may be moved into a separate issue so long as the state is Draft)
@josephineSei josephineSei self-assigned this Mar 25, 2024
@josephineSei
Copy link
Contributor Author

Why are we talking about AZs: AZs focus on redundancy and failure safety on IaaS-Level.

While redundancy at the lowest level could be just something like having replication in the storage backend, so there is no data loss in the case of a hardware failure, the requirements can be as hard as having a remote mirror of all data.

  1. AI: We should at least document, what different levels of redundancy means and what failure safety different deployments can provide (@anjastrunk maybe the latter one would be something for gaia-x self-descriptions?)

Pre-Requirements

To also allow having small deployments or edge deployments, that usually only have 1 single AZ, we must not require a certain amount of AZs.
Redundancy and Failure safety in that case should be done on the next higher level (PaaS, CaaS, workload...) by the user.

We should rather define and check, for when AZs can be defined and used.

What can AZs be defined of / What can they separate

  • AZs are implemented in various ways for different resources, as those resources belong to different OpenStack services.
  • There are AZs for Nova (Compute), Cinder (Block Storage) and Neutron (Networking)

AZs are logical separations with a chance of physical separation.
AZs can be defined:

  • for Nodes with different Power Supplies
  • for nodes in different fire-zones, separated by strong fire safety mechanism (thinking about a whole deployment that burns down)
  • splitting AZs rack-based (having the top-of-rack switch or powersupply as single point of failure)
  • having a chance for planned maintenance (e.g. upgrade one AZ after another, and telling customers to "just" switch AZs) -> maybe with rolling upgrades that is not common anymore
  • to distinguish between backends (KVM vs another hypervisor). For storage there is always the possibility to use different volume types, so this is mainly applicable to Compute
  • for security reasons: either one AZ for one customer or one AZ for one physical node for tenant separation on hypervisors (there are other options and maybe better ways to accomplish this)

Problems:

  • users need to understand whats the difference between two AZs, but they do not have the knowledge of the underlying infrastructure, the capacity of the AZs or how full (of VMs or Volumes) one AZ is right now. So in many cases user might just guess or take some default.

Restrictions:

A good but a bit outdated overview was presented at the Summit in 2018 ( https://www.youtube.com/watch?v=a5332_Ew9JA )

Proposal:

  • The SCS should not require any AZs
  • The SCS should gather information from all CSPs about their usage of AZs
  • A DR should be written to define what types of AZ are recognized by the SCS (What do we want to achieve when having AZs)
  • The DR should also include which services of OpenStack having AZs are recognized (e.g. There is little value in having Volume AZs when most backends already provide redundancy)
  • Levels of Redundancy / Failure Safety should be defined at a very high-level point of view for IaaS and maybe other Layers too

@josephineSei
Copy link
Contributor Author

I created a hedgedoc for CSPs to talk about their AZ usage:
https://input.scs.community/Availability-Zone-Usage#

@josephineSei
Copy link
Contributor Author

Up until now, there was not much input - so I put it on the agenda again for the next IaaS call

@anjastrunk anjastrunk added standards Issues / ADR / pull requests relevant for standardization & certification SCS-VP10 Related to tender lot SCS-VP10 labels Apr 15, 2024
@josephineSei
Copy link
Contributor Author

A few CSPs answered the questions in the hedgedoc, so we can go on with the work on AZs.
There was also a proposal as what to use in the hedgedoc.

The problem here is, when in a deployment AZs are used differently those deployment might not be changed, because change the AZ-architecture is quite fundamental. So all other deployments would be automatically rendered scs-incompatible.

Another option is to use the failsafe levels that will be defined in #527, this would be more vague - we should discuss, whether we want this or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification
Projects
Status: Backlog
Development

No branches or pull requests

2 participants