Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to Node distribution test (#489) #558

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

cah-hbaum
Copy link
Contributor

@cah-hbaum cah-hbaum commented Apr 10, 2024

  • added a new input that enables testing the test by providing yaml files containing label information for different nodes
  • added "test-example.yaml" containing an example for such a test file
  • removed the internal config, since the normal config file is already provided and read in as a default
  • added a pytest that enables easy testing of the check_nodes function (thanks to @martinmo)

@cah-hbaum cah-hbaum added Container Issues or pull requests relevant for Team 2: Container Infra and Tooling standards Issues / ADR / pull requests relevant for standardization & certification SCS is continuously built and tested SCS is continuously built and tested in order to raise velocity and quality SCS is standardized SCS is standardized SCS-VP10 Related to tender lot SCS-VP10 labels Apr 10, 2024
@cah-hbaum cah-hbaum self-assigned this Apr 10, 2024
@cah-hbaum
Copy link
Contributor Author

Also, like its written in the issue itself, we could think about providing some more scenarios for this test, especially with the newly added testing possibilities via files.

@cah-hbaum cah-hbaum requested a review from martinmo April 10, 2024 08:49
Copy link
Member

@martinmo martinmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea with the test data. I have some suggestions about this to make it easier to unit test, see the inline comments.

Apart from that: I think it makes more sense to remove external logging configuration and the --config flag. It is not used anywhere and the short form -c also conflicts with the short form of --os-cloud in the IaaS scripts.

The logging config can be simplified like it is done here:

if __name__ == '__main__':
logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO)
cli(obj=Config())

Also, when I run without any arguments, the script fails.

$ ./kaas/k8s-node-distribution/k8s-node-distribution-check.py
The config file under ./config.yaml couldn't be found.
Traceback (most recent call last):
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 127, in initialize_config
    with open(config.config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: './config.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 131, in initialize_config
    exit(1)
  File "/usr/lib/python3.10/_sitebuiltins.py", line 26, in __call__
    raise SystemExit(code)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 244, in <module>
    return_code = asyncio.run(main(sys.argv[1:]))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 197, in main
    config = initialize_config(parse_arguments(argv))
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 135, in initialize_config
    setup_logging(config.logging)
  File "/home/martinmorgenstern/workspace/scs-standards/Tests/kaas/k8s-node-distribution/k8s-node-distribution-check.py", line 112, in setup_logging
    logging.config.dictConfig(config_log)
  File "/usr/lib/python3.10/logging/config.py", line 811, in dictConfig
    dictConfigClass(config).configure()
  File "/usr/lib/python3.10/logging/config.py", line 375, in __init__
    self.config = ConvertingDict(config)
TypeError: 'NoneType' object is not iterable

@cah-hbaum cah-hbaum linked an issue Apr 17, 2024 that may be closed by this pull request
3 tasks
Copy link
Member

@martinmo martinmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard states that

At least one control plane instance MUST be run in each "failure zone"

I played around with the test-example.yaml you provided in ipython and created a scenario where all control plane nodes are in the same zone and region. In this case, the script would still exit with 0 and just print

WARNING: There seems to be no distribution across multiple regions or labels aren't set correctly across nodes.
WARNING: There seems to be no distribution across multiple zones or labels aren't set correctly across nodes.
INFO: The master nodes are distributed across 3 host-ids.
INFO: The worker nodes are distributed across 2 regions.

Shouldn't it produce an ERROR? Or do we assume the physical host is the failure zone?

Furthermore, I would like to see more scenarios like the one in the yaml file, and have them tested automatically in the CI against the expected check outcome, with pytest.

The config yaml template file can be removed.

@cah-hbaum
Copy link
Contributor Author

The standard states that

At least one control plane instance MUST be run in each "failure zone"

I played around with the test-example.yaml you provided in ipython and created a scenario where all control plane nodes are in the same zone and region. In this case, the script would still exit with 0 and just print

WARNING: There seems to be no distribution across multiple regions or labels aren't set correctly across nodes.
WARNING: There seems to be no distribution across multiple zones or labels aren't set correctly across nodes.
INFO: The master nodes are distributed across 3 host-ids.
INFO: The worker nodes are distributed across 2 regions.

Shouldn't it produce an ERROR? Or do we assume the physical host is the failure zone?

The thing that was a bit problematic while writing the standard was that different CSPs probably have different ideas about their failure zone, especially having small CSPs in mind. So it could very well be possible that the physical hosts are dividing the cluster into failure zones.
Thats why the test doesn't throw an error here.

Personally I think thats perfectly reasonable, but if you have another opinion on that, we can discuss this further. (But that would probably also mean to change the standard again)

Furthermore, I would like to see more scenarios like the one in the yaml file, and have them tested automatically in the CI against the expected check outcome, with pytest.

I'm gonna add the other files I added with some additional examples.

The config yaml template file can be removed.
You're right.

@martinmo
Copy link
Member

The thing that was a bit problematic while writing the standard was that different CSPs probably have different ideas about their failure zone, especially having small CSPs in mind. So it could very well be possible that the physical hosts are dividing the cluster into failure zones.
Thats why the test doesn't throw an error here.

Personally I think thats perfectly reasonable, but if you have another opinion on that, we can discuss this further. (But that would probably also mean to change the standard again)

Thanks for the clarification, I agree 👍

@cah-hbaum
Copy link
Contributor Author

Added some more example files.

@cah-hbaum cah-hbaum requested a review from martinmo April 24, 2024 10:57
Copy link
Member

@martinmo martinmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review round:

As we discussed today in person, I've committed a pytest script that uses the scenarios in your YAML files and I added a few more (in my opinion) interesting scenarios.

One test currently fails (intentionally) because the conformance test issues a warning twice if the host-id label is missing where I think it should only do that once (that's one of the new scenarios I added). What do you think?

Furthermore, I propose to replace all occurences of master with control-plane (or just control), because this fits better with the current terminology in Kubernetes (master is outdated).

Small typo, which is not part of the diff: in line 149, it should read "the label … doesn't …" instead of "… don't …".

@cah-hbaum
Copy link
Contributor Author

cah-hbaum commented Apr 25, 2024

One test currently fails (intentionally) because the conformance test issues a warning twice if the host-id label is missing
where I think it should only do that once (that's one of the new scenarios I added). What do you think?

Agree with that, that duplication shouldn't happen. I removed that in the coming commit.

I also did a general update to test test script in order to enforce the standard requirement for labels to be provided (MUST). And I made a small change to your test to reflect this.

@cah-hbaum cah-hbaum requested a review from mbuechse April 30, 2024 10:14
cah-hbaum and others added 8 commits April 30, 2024 12:14
- added a new input that enables testing the test by providing yaml files containing label information for different nodes
- added "test-example.yaml" containing an example for such a test file
- removed the internal config, since the normal config file is already provided and read in as a default

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
Some dates fixing some problems mentioned by @martinmo.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
Adds additional test files and removes previously deprecated config (template) files.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
Signed-off-by: Martin Morgenstern <martin.morgenstern@cloudandheat.com>
Signed-off-by: Martin Morgenstern <martin.morgenstern@cloudandheat.com>
Signed-off-by: Martin Morgenstern <martin.morgenstern@cloudandheat.com>
Some fixes and updates in order to be compliant with the testdata.
Thanks to @martinmo.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
Signed-off-by: Martin Morgenstern <martin.morgenstern@cloudandheat.com>
@cah-hbaum cah-hbaum force-pushed the 489-small-update-to-scs-0214-test branch from a2a432a to 976a078 Compare April 30, 2024 10:14
Copy link
Contributor

@mbuechse mbuechse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor PEP-8 thing.

# infrastructure parts to smaller ones. So we first look if nodes are distributed
# across regions, then zones and then hosts. If one of these requirements is fulfilled,
# we don't need to check anymore, since a distribution was already detected.
labels = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labels = (
LABELS = (

this is the way PEP-8 recommends it, I think
(of course, the rest of the script would have to be adjusted accordingly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to LABELS

class LabelException(BaseException):
"""Exception raised if a label isn't set"""


class Config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class now seems a bit superfluous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But it can stay in case the options grow again)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna keep that for now.

Copy link
Contributor

@mbuechse mbuechse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I would like to request two changes:

  • rename the script under test so that it can be imported directly
  • put the test data into a single yaml file, load that once, and hand out the sections using a fixture

Small pep-8 changes.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
@mbuechse had some change requests, that are tackled with this commit.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
@cah-hbaum
Copy link
Contributor Author

Ah sorry, I would like to request two changes:

  • rename the script under test so that it can be imported directly
  • put the test data into a single yaml file, load that once, and hand out the sections using a fixture

c60f895 should have the requested changes.

@cah-hbaum cah-hbaum requested a review from mbuechse May 22, 2024 07:12
@mbuechse
Copy link
Contributor

@cah-hbaum Thanks! Sorry for the long delay on my end. I think you need to update https://github.com/SovereignCloudStack/standards/blob/main/Tests/scs-compatible-kaas.yaml as well, no? Apart from that, I would prefer if the yaml file were loaded only once, which should be possible with pytest, if you use fixtures, or if you just do it on the module level. But it's not a deal breaker.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
@cah-hbaum
Copy link
Contributor Author

@cah-hbaum Thanks! Sorry for the long delay on my end. I think you need to update https://github.com/SovereignCloudStack/standards/blob/main/Tests/scs-compatible-kaas.yaml as well, no? Apart from that, I would prefer if the yaml file were loaded only once, which should be possible with pytest, if you use fixtures, or if you just do it on the module level. But it's not a deal breaker.

I updated the scs-compatible-kaas.yaml and introduced fixtures (I hope thats how you meant it).

Copy link
Contributor

@mbuechse mbuechse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Container Issues or pull requests relevant for Team 2: Container Infra and Tooling SCS is continuously built and tested SCS is continuously built and tested in order to raise velocity and quality SCS is standardized SCS is standardized SCS-VP10 Related to tender lot SCS-VP10 standards Issues / ADR / pull requests relevant for standardization & certification
Projects
Status: Doing
Development

Successfully merging this pull request may close these issues.

Small updates to conformance test for scs-0214-v1
3 participants