Skip to content
This repository has been archived by the owner on Dec 31, 2023. It is now read-only.

VPC network leakage in sample tests #138

Closed
m-strzelczyk opened this issue Oct 22, 2021 · 5 comments · Fixed by #139
Closed

VPC network leakage in sample tests #138

m-strzelczyk opened this issue Oct 22, 2021 · 5 comments · Fixed by #139
Assignees
Labels
api: compute Issues related to the googleapis/python-compute API. samples Issues that are directly related to samples. type: process A process-related concern. May include testing, release, or the like.

Comments

@m-strzelczyk
Copy link
Contributor

While working on #136 I have encountered some problems with tests - the tests that required creation of new VPC networks were consistently failing. After reaching out to python-sample-owners to inspect the test project, I have received Viewer permissions from @leahecole. Turns out that a fixture and tests added in #130 (file samples/snippets/test_sample_create_vm.py ) are somehow leaking VPC networks. After couple runs, the test project started to hit the VPC subnetwork quota and as result started to fail the tests.

@leahecole has cleaned up the unneeded networks, but we still need to make sure that this will not repeat. I will improve the tests to be more certain that any resources that got created are removed.

@product-auto-label product-auto-label bot added api: compute Issues related to the googleapis/python-compute API. samples Issues that are directly related to samples. labels Oct 22, 2021
@m-strzelczyk m-strzelczyk self-assigned this Oct 22, 2021
@m-strzelczyk
Copy link
Contributor Author

I see that #134 was also affected.

@m-strzelczyk
Copy link
Contributor Author

OK, I have run a test again against the Python 3.8 project and I see what's going on.

Since the python-docs-samples-tests-py38 project is part of the google.com organization in GCP, every VPC Network that gets created gets automatically a bunch of firewall rules created as well. When my tests try to delete the network after it is no longer needed, there's an error stating that the network is used by another resource. In this case, this other resource is one of the many firewall rules that got auto-created for the network.

I'm going to replicate this issue in my own project in the google.com org and see if I can find a nice solution to this.

@m-strzelczyk
Copy link
Contributor Author

I have replicated the issue. Unfortunately there's a race condition that we need to deal with here.

GCE Enforcer (the automation that guards all projects in google.com org) attempts to create the firewall rules for the new VPC Network right after its creation, then every more or less 2 minutes makes sure that the rules are still there. If any rule is missing, it adds it again. Unfortunately, when using GAPIC API, the network deletion will fail, if there are any non-default firewall rules added to this network. This is different from deleting networks using Cloud Console or gcloud, where those rules are silently deleted for you.

As a result, the proper solution to assure network deletion in a project that is under GCE Enforcer is, in pseudocode:

while network exists:
  while firewall rules in the network exist:
    attempt to delete all firewall rules
  attempt to delete the network

I unfortunately don't see any better way to ensure proper deletion of the newly created VPC network using API while fighting GCE Enforcer :( One way to make it simpler would be to ask for the test project to be excluded from the GCE Enforcer or to move those projects outside of google.com organization. I think both solutions are not really viable.

I will implement a solution following this pseudocode so we can discuss it further.

@m-strzelczyk
Copy link
Contributor Author

m-strzelczyk commented Oct 24, 2021

I have an update: GCE Enforcer is faster than I thought. Right now my script is literally racing with it to delete all the firewalls and can't do it fast enough. I need to see how gcloud and Cloud Console are able to delete the networks...

Actually, gcloud is not deleting the firewalls for me. gcloud also has problems deleting the VPC network that's under GCE Enforcer control...

@m-strzelczyk
Copy link
Contributor Author

After many tests, I come to conclusion that I can't beat the GCE Enforcer. I will modify the test to just use the default network and one of its subnetworks. It's fine for this scenario, but I guess the issue will return once we get to implementing networking samples that will need to create new networks.

@busunkim96 busunkim96 added the type: process A process-related concern. May include testing, release, or the like. label Oct 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: compute Issues related to the googleapis/python-compute API. samples Issues that are directly related to samples. type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants