Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating test automation framework #774

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nicvanorton
Copy link

Short description

Integrates pytest based test automation framework into pmacct (finally... :-))
Added folders:

  • test-framework: python code library and scripts for running and debugging the tests
  • tests: test case files (test logic, pmacct configuration, pcap, expected json output, expected logs, traffic reproducer configuration, etc.) organized in test-specific folders

For more information about the framework, please refer to the readme file in test-framework folder.

Checklist

I have:

  • added the LICENSE template to new files
  • compiled & tested this code
  • included documentation (including possible behaviour changes)

@paololucente please review and let us know if you agree with the integration approach

Copy link
Contributor

@rodonile rodonile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added Dockerfiles to create containers for the daemons where the entrypoint process is not ran by root. This is only used for local runs of the test framework and not by the CI, so no changes on the containers built there. The purpose is only to be able to manually run the tests as non-root users in a linux host.

@rodonile
Copy link
Contributor

One thing that you should do would be enabling Github pages deployment from github-actions in the repo settings.

image

Otherwise the last CI job (pytest-html-report-deploy) will not be able to run and deploy the html report.

Copy link
Member

@paololucente paololucente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nicvanorton and Leonardo @rodonile for this very useful and much wanted contribution. I started to have a look myself but wanted to ping Marc ( @msune ) as well as he is the original developer of the CI in the pmacct repo. Paolo

@msune
Copy link
Contributor

msune commented Apr 22, 2024

Hey @nicvanorton @rodonile,

Sorry for the late response... busy days!

First and foremost, this looks like a great step forward for pmacct's E2E testing. Thank you and congrats 🥳! It's quite inline with the E2E tests I had in mind, and complementary to some of the PRs I will issue in the upcoming week in that direction.

This PR is quite big, so it will take me some time to review it.

It would have been a good idea to start with a smaller PR, in which the base test framework, strategy and a single smoke test was contributed, so that we could have a more focused discussion. (If at all possible, I would highly encourage to rescope this PR in that direction; it will hugely simplify my review work and your rebase work if changes are required).

There are a couple of things we can start to discuss right off the bat:

1. The need for root containers

I would like to avoid having a copy of the Dockerfiles that just change the default username. I would either stick to the current version or switch to non-root users (and I need to figure out the pros and cons, but likely the latter is the better - if possible, but the gain might be marginal).

You mentioned the reason #774 (review). Is there a chance you could add documentation in the doc/ folder on how to run the tests locally? I would like to try them myself

2. Tracking binary files

Tracking binary files in git is generally not a good idea. It will make the repo bigger. It's quite clear PCAPs are an essential part of the E2E tests. I like the fact that their are tracked alongside of tests, to avoid discrepancies between input files and tests at a given point in time.

I would suggest to look into two possible alternatives:

  1. Use git LFS: this, however, breaks the consistency test - input
  2. Use a git submodule for all input pcap files, so that this stays in a separate repository.

3. output*.json files

A substantial port of the contribution is the output files. These files have timestamps and other non-relevant information which adds up (a large portion of the PR are these files). There are transforming methods/functions that reinterpret (a part of) these JSON files.

I am not sure I understand the reasoning behind it. Did you evaluate using Python and pytest asserts to parse and match the output JSON? There are several reasons why - at first glance - I would think this is a better approach

  1. You don't have to track unnecessary information
  2. If tests change, and need to update the inputs, you will not create a diff for many lines which are essentially meaningless to the test itself (E.g. due to changes in timestamp and other ignored fields)
  3. It is more obvious to the developer what is being checked and which value is expected by just looking at the assert description. It is also easier to add/modify/delete asserts (checks).
  4. Instead of giving a generic error "output doesn't match input", pytest will output which exact assert(s) didn't match in the test.

So, for instance, instead of this:

https://github.com/pmacct/pmacct/pull/774/files?short_path=2c498a9#diff-ca4648ac059764e28a0658df2adaebfa339a898ab3a0ab140fdbb6ca6d97e8f4R24-R25

records = json.loads(stdout_output) #or load from a file
for key, flow_record in records.items(): #assumes output is a dict
   assert flow_record["peer_ip_src"] == "192.168.100.1"
   #...
#...

Let me know your thoughts.

Regards
Marc

@nicvanorton
Copy link
Author

Hi @msune,

Thanks for your message and positive comments!

The PR is big indeed, as it was developed during more than a year. Much of its size, though, is due to the many test cases that have been built around it and included therein, which may not need a thorough review (they check pmacct functionality itself and have been tested for months). I'd say the main need for review lies in the framework approach itself; this is why I think the effort of going through a new PR would not pay off (since the framework itself will still need to be included).

1 (Non-)root containers
We used non-root Dockerfiles basically to avoid having to run the whole framework (and, therefore, the tests) with sudo every time. If I remember correctly, the need to run with sudo stemmed from some inconsistency across different OSes in terms of user group id numbering and Docker's interpretation of it.
Generalising the non-root Dockerfiles to the main pmacct images would otherwise sound reasonable (if technically feasible), but is probably beyond the scope of this PR (and of the test framework).

2 Binary files
Keeping binary files may not be elegant, but (as @rodonile pointed out) it's kind of common practice - for example, tcpdump's GitHub repo also includes pcaps (link). After all, if we are to keep pcap files in a separate repo, we could as well keep the full test automation framework there and not integrate it with pmacct in the first place.

3 Output files
I think I see your point, but not sure I understand the proposal. We typically look at hundreds (if not thousands) of json messages per test case, so hard-coded asserts would not be usable. Furthermore, the order of messages is not fixed, so we need to be tolerant in this respect. On top, we some times need to account even for messages coming in multiple times (e.g., in HA scenarios). To accommodate for these requirements, the comparison logic applies to two sets of json dictionaries: received messages and output file (see json_tools.py in library/py).
To save on data volume, yes, we can possibly delete from the output files all the fields that will be ignored anyway. This does not include all timestamps, though, as some of them are expected to have specific values.
As for the case of non-match, the framework does not just print a generic "output doesn't match input" message, but instead shows the "closest match", as well as the delta to the closest matched message.

As for running the framework locally, there's documentation about that in the README file. Let me know if something in there is not clear, or something does not run as expected/documented.

Best,
Nicolas

@msune
Copy link
Contributor

msune commented Apr 24, 2024

Generalising the non-root Dockerfiles to the main pmacct images would otherwise sound reasonable (if technically feasible), but is probably beyond the scope of this PR (and of the test framework).

I would say, let's remove this part from this PR, and open another for it.

Keeping binary files may not be elegant, but (as @rodonile pointed out) it's kind of common practice - for example, tcpdump's GitHub repo also includes pcaps (link).

Yes, some repositories do that at a cost (not sold on that it's common practice; take a look at the hard limits of github for large files).

For tcpdump for instance:

marc@Mriya:~/personal/tcpdump$ du -hs
50M	.
marc@Mriya:~/personal/tcpdump$ du -hs tests/
15M	tests/
marc@Mriya:~/personal/tcpdump$ du -hs .git
31M	.git

Tests essentially occupy 30% of the unpacked repository, and likely ~40-50% of the git db (binaries can't be that easily compressed).

After all, if we are to keep pcap files in a separate repo, we could as well keep the full test automation framework there and not integrate it with pmacct in the first place.

That is actually my plan for another E2E suite. I don't want CI/CD to be downloading a lot of stuff it won't use. A submodule is cheap and guarantees consistency at the commit level. The repo could be under pmacct's org.

3 Output files
I think I see your point, but not sure I understand the proposal. We typically look at hundreds (if not thousands) of json messages per test case, so hard-coded asserts would not be usable. Furthermore, the order of messages is not fixed, so we need to be tolerant in this respect. On top, we some times need to account even for messages coming in multiple times (e.g., in HA scenarios). To accommodate for these requirements, the comparison logic applies to two sets of json dictionaries: received messages and output file (see json_tools.py in library/py).
To save on data volume, yes, we can possibly delete from the output files all the fields that will be ignored anyway. This does not include all timestamps, though, as some of them are expected to have specific values.
As for the case of non-match, the framework does not just print a generic "output doesn't match input" message, but instead shows the "closest match", as well as the delta to the closest matched message.

I am not sure I follow. I've worked in systems very similar to this to validate dataplanes (using pytest too, SW and HW packet generators etc..), in which indeed ordering of outputs was not guaranteed all the time. You typically add checks in order when possible, and then accumulate (execution) outputs and sort for sections that interlacing could happen, but still comparing to manually created synthetic outputs.

Doing "fuzzy pattern matching" on outputs as per described seems a slippery slope to me. First, pattern matching in itself can have bugs. Second, if you take outputs from the same device under test (pmacct) that you are attempting to validate (let's say, from what you think is a valid run) - which I think you do -, you run a high risk of compensating bugs with your own "supposedly valid" observations, as both the expected output and the actual output come from executions of the DUT (pmacct). This is particularly easy go under the radar when test cases are modified or new test cases added (and hence outputs), because it requires a very careful human validation on the entire output to match.

@nicvanorton
Copy link
Author

Concerning the repo and submodule I'm easy. No strong preference, so I'm leaving this to you, Paolo and Leonardo (Leonardo will be back from holiday in a couple of weeks).

As for the output files, our approach is not very different than the one you described. We practically split the test into sections, each of which has its own corresponding output file. Assertions are done for each section in the designated order. In any case, it is up to the test author to use more detailed assertions if desired (...and even not use output files at all).

It's an interesting point that using DUT's output creates a cyclic dependencies. TMHO this risk is minimal because the output files are supposed to be manually evaluated by the test author. And it is superseded by the fact that hard coded assertions cannot have an exhaustive coverage.

Finally, it also boils down to the question whether one is primarily interested in detecting changes in DUT's behaviour (regression), or checking against a list of well-defined standards or requirements (functionality).

@msune
Copy link
Contributor

msune commented Apr 24, 2024

And it is superseded by the fact that hard coded assertions cannot have an exhaustive coverage.

Why do you think so? The fact that the output is synthetically generated (and not a result of an execution of the DUT), doesn't preclude to have thousands of assert conditions. I would counter-argue you can have a more exhaustive and especially controlled coverage.

When lots of things need to be tested, that's where pytest shines; fixtures and simply looping over dynamically generated asserts( which are synthetically generated, but are deterministic and immutable if the test code doesn't change), is the way to go.

Can you point me to the worst case test that you think can't be checked against pytest asserts? I am fairly sure we can do that.

Finally, it also boils down to the question whether one is primarily interested in detecting changes in DUT's behaviour (regression), or checking against a list of well-defined standards or requirements (functionality).

I am not sure I follow. They seem to me complementary (and overlapping) dimensions of any testing.

@rodonile
Copy link
Contributor

rodonile commented May 2, 2024

Hello Marc, thanks for taking the time to review!

Here is my take as well on some of the points addressed:

  1. In principle I also liked the idea of unifying the containers to be non-root, and I also thought about it. In the end however we opted against it since it would break certain deployment scenarios. For example, binding to specific interfaces with the container in network_mode=host would not work anymore as the socket bind would fail.

Given this, to address your concern we could move the non-root Dockerfiles back within the test-framework folder. I know that this is not really a solution, but anyway in the end we would only use them for local runs of the test. Or even extend the container builder script to enhance the root Dockerfiles with the 2 additional lines, which then removes the need to maintain the Dockerfiles in 2 different locations.I figured it could be interesting for somebody that prefers to run rootless containers, to have the Dockerfile available as well in the repo, just not being the default one used by the CI.

For local runs of the tests we really need this, as otherwise it becomes a pain: manual clearing of the results folder would be required as the pmacct generated files are root owned.

  1. Fully agree with Nicolas, that only leaving the pcap files out does not make much sense.

About this point, we had a meeting with Paolo where we aligned on the integration procedure and decided to include all the code in the master branch. Leaving it in a separate repo was an option. Unfortunately you were not involved in the discussions until now, and we already spent time working towards integrating everything as you see here.

The main reason we decided in the end to include all the tests upstream is that this allows for submitting a single PR for a new feature or bug fix with code + tests changes/new tests. Imagine that we add a new field in the default bmp peer_up schema, then within the PR we would also modify the output files by adding this new field where needed. With the tests in another repo, this becomes more difficult to maintain, as you need first a PR with the new tests in the test repo and then a PR in pmacct with the code + updated submodule ref commit.

My personal opinion is that overall output files and pcap files are not that big, we don't even go near the limits stated by Github for large files even for tests with lots of messages. And in general it doesn't make sense to use oversized pcap files for test cases. Our biggest for example is like 260KB. As an idea, we could state a limit size on the pcap files that we accept to be merged.

Anyway, I'm not strongly against the submodule approach. The main problem here is from my side is that this reorg would take more time than expected, which I don't really have in the following weeks (I don't know about Nicolas). I'm open to setup a call to discuss this further. And also, as I mentioned above, there is the added burden that this is harder to maintain.

  1. Thanks Nicolas for elaborating on this point. I'm not entirely sure we are on the same page though, and here again a call might be useful instead of ping pong here :)

Anyway, when it comes to output generation we are basically free to do it as we like. It is simpler to just take the output from pmacct, inspect it and declare it as "ok" and go with it. Synthetically generating such an output would be nicer I agree, but this means reproducing pmacct's logic to generate it (quite the effort I would say). Consider that we are aggregating flow records and correlating with BGP Rib information in some cases. Have a look at e.g. test 400.

If your concerns were mostly related to the fuzzy pattern matching and substitution (e.g. ip addresses) that is sometimes performed, note that this is not mandatory and could be avoided entirely by providing output files with the ip addresses used by the framework already. We could also remove all fields that we ignore from the expected output files (as anyway they are removed from the json messages we receive from pmacct before they are compared). This way we avoid tracking e.g. timestamps that change on all runs, that might be a good idea.

I'm not sure if I'm correctly addressing your concerns here though.

Let me know what you think :)

Have a good night!

Cheers,
Leonardo

@msune
Copy link
Contributor

msune commented May 2, 2024

Thanks. The only piece of information that yo @rodonile added to the conversation is that Paolo was aware of that. That's ok. Just to be clear, my concerns, in order of priority are 3, 1 and 2.

About 3)

Can you point me to the worst case test that you think can't be checked against pytest asserts? I am fairly sure we can do that.

The offer is still on the table.

I've worked with systems that had to validate dataplane traffic which had a lot of spurious and interlaced packets, and needed to validate a lot of parallel and interlaced Control Plane flows using pytest and scapy. I am talking hours of tests, thousands of assertions (coded using pytest), and that was never a problem. Quite the opposite, knowing what you expect as a test designer and checking for exactly that helped to find a lot of bugs - including bugs in the test. Using as input a (very complex) output of your own DUT is a bad idea.

About 1)

For local runs of the tests we really need this, as otherwise it becomes a pain: manual clearing of the results folder would be required as the pmacct generated files are root owned.

I am confused... sudo rm -rf as a cleanup seems simple and clean. If your user in the docker group is able to spawn privileged containers (like pmacctd needs), you will surely have sudoer permissions (else you have a security hole in your system).


I don't want to block this PR. I also don't have time for this. So I would kindly ask you to remove the non-root containers from the PR, and I leave the decision to merge the PR entirely to Paolo, who at the end is the owner and the maintainer of pmacct. I think having them is better than nothing, yet it seems the best moment to have this discussion and set some good foundations.

@msune msune removed their assignment May 2, 2024
@rodonile
Copy link
Contributor

rodonile commented May 3, 2024

@msune

Thanks. The only piece of information that yo @rodonile added to the conversation is that Paolo was aware of that. That's ok. Just to be clear, my concerns, in order of priority are 3, 1 and 2.

I tried to provide more insight so that you could have the full picture as to why we made those choices.

The offer is still on the table.

As I wrote above, you could consider test 400 as an example.

I've worked with systems that had to validate dataplane traffic which had a lot of spurious and interlaced packets, and needed to validate a lot of parallel and interlaced Control Plane flows using pytest and scapy. I am talking hours of tests, thousands of assertions (coded using pytest), and that was never a problem. Quite the opposite, knowing what you expect as a test designer and checking for exactly that helped to find a lot of bugs - including bugs in the test. Using as input a (very complex) output of your own DUT is a bad idea.

I'm sure this can be done in the way you are suggesting. Just to me it seems more difficult and time consuming to do so, but tbh not having much testing/pytest experience I might be wrong, absolutely. And also this would mean changing most of our testing logic, so would only be feasible in the long term, not short term.

I can tell you from our point of view in Swisscom we have been using this test framework since months now for regression testing and it's been quite stable, helping to identify several bugs introduced, and most importantly giving us more peace of mind for the dev/prod deployments (before it was really "deploy and hope for the best").

And of course, we can keep improving/extending it over time and gradually move to an approach with synthetic output generation. What do you think about this? From what I understood from your comments you would prefer the test logic being fully redesigned before merging, and I was skeptical about this mainly for time reasons. Would it be too problematic to move in that direction afterwards?

I am confused... sudo rm -rf as a cleanup seems simple and clean. If your user in the docker group is able to spawn privileged containers (like pmacctd needs), you will surely have sudoer permissions (else you have a security hole in your system).

Ok, fine for me, will remove those. We only have tests for nfacctd, pmbmpd and pmbgpd for now, those can run as non privileged. It's just that in our initial requirements when we were discussing the framework architecture we had rootless execution as a requirement. But it's fine, all those interested in running the tests locally most likely have sudo permissions on their system.

yet it seems the best moment to have this discussion and set some good foundations.

I agree, and I'm open to discuss. Should we organize a meeting?

nicvanorton and others added 6 commits May 6, 2024 08:33
Changelog:

- Copied in the two split folders, edited required framework files and .gitignore
- Modified build_docker_images.sh to find pmacct in root project folder, ran 200:00, 300:00, 400:00 (passed)
- add -c argument to pass pytest.ini file (otherwise it's not found and markers give warning)
- minor packet-timing related updates to test 302&304
- minor packet-timing related updates to test 202&207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants