Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make recursive delete operations more intuitive #3589

Open
svkrieger opened this issue Jan 9, 2024 · 0 comments
Open

Make recursive delete operations more intuitive #3589

svkrieger opened this issue Jan 9, 2024 · 0 comments

Comments

@svkrieger
Copy link
Contributor

Issue

Currently recursive delete operations like org, space, app delete, which implicitly delete service bindings and/or service instances, will fail if one of the service related deletions are handled asynchronously by the service broker. This is not optimal as users have to trigger the deletion of the parent resource again. Also currently users will get the following error message, which does not really reveal what is going on:
An operation for the service binding between app myapp and service instance myinstance is in progress.

Context

[provide more detailed introduction]

Steps to Reproduce

For example delete an app, which is bound to a service instance. The broker should answer unbinding requests with a 202.

Current result

The following tables describe the resulting behaviour for different resources and responses by the broker. In general all the recursive deletions fail if a sub-resource gets deleted asynchronously. The behaviour is the same for service instances and service bindings.

Delete service instance when service binding present

API call resulting request to broker broker response current behaviour
DELETE /v3/service_instance/ <service_instance_guid> DELETE /v2/service_instances/ <si_guid>/service_bindings/<sb_guid> 202 Starts polling service binding last operation and sets service instance delete job and service instance last operation to failed with message "delete could not be completed: An operation for the service binding between app myapp and service instance myinstance is in progress."
500 Both the service instance and service binding delete operations fail
200 Binding will be gone immediately, then a delete service instance request will be sent to the broker and either its gone too or polling starts

Delete app when bound to service

API call resulting request to broker broker response current behaviour
DELETE /v3/apps/<app_guid> DELETE /v2/service_instances/ <si_guid>/service_bindings/<sb_guid> 202 Starts polling service binding last operation and sets app delete job to failed with message "Job (3d1d051f-6c94-47ab-85e2-dec27e0db75a) failed: An operation for the service binding between app myapp and service instance myinstance is in progress."
500 The app delete job fails and the service binding operations state is set to failed. Error message: "Job (17662f61-4111-4712-91da-d3ab11629ba7) failed: Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200 Service binding gets deleted, app gets deleted and app delete job set to COMPLETE

Delete space which contains a service binding

API call resulting request to broker broker response current behaviour
DELETE /v3/spaces/ <space_guid> DELETE /v2/service_instances/ <si_guid>/service_bindings/<sb_guid> 202 Starts polling service binding last operation and sets space delete job to failed as well as service instance last operation with message "Job (5b91523b-fe11-4cdb-bbc9-063b65fa8dee) failed: Deletion of space myspace failed because one or more resources within could not be deleted. An operation for the service binding between app myapp and service instance myinstance is in progress."
500 The space delete job fails and the service binding and service instance last operations state is set to failed. Error message: "Job (ebce0407-9560-44fb-9b39-7b06f35edb4f) failed: Deletion of space d071102 failed because one or more resources within could not be deleted. Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200 Service binding, service instance, app and space gets deleted, and space delete job set to COMPLETE

Delete org, which contains a space which contains a service binding

API call resulting request to broker broker response current behaviour
DELETE /v3/organizations/ <organization_guid> DELETE /v2/service_instances/ <si_guid>/service_bindings/<sb_guid> 202 Starts polling service binding last operation and sets org delete job to failed as well as service instance last operation with message "Job (0daca787-a8b5-4433-967c-3b0c8d2e1798) failed: Deletion of organization d071102 failed because one or more resources within could not be deleted. Deletion of space d071102 failed because one or more resources within could not be deleted. An operation for the service binding between app myapp and service instance myinstance is in progress."
500 The org delete job fails and the service binding and service instance last operations state is set to failed. Error message: "Job (5c7ffdc1-7bb3-4a95-96da-33d19c5d4e79) failed: Deletion of organization d071102 failed because one or more resources within could not be deleted. Deletion of space d071102 failed because one or more resources within could not be deleted. Service broker failed to delete service binding for instance myinstance: The service broker returned an invalid response. Status Code: 500 Internal Server Error, Body: {"state":"in progress"}"
200 Everything gets deleted and organization delete job set to COMPLETE

Further findings

  • All recursive deletions will trigger the deletion of all sub-resources (except they depend on each other). For example an app delete will trigger the deletion of all service bindings of that app. If one binding fails to delete or is being deleted asynchronously, the job will continue to trigger the deletion of all other bindings. Service instances, which have bindings which are in deletion won't be deleted.

Expected result

Best case would be if the recursive delete operations can handle asynchronously deleted sub-resources. See next section for some ideas on how to achieve this.

Possible Fix

Re-enqueue recursive jobs instead of setting them to failed

The deletion jobs could be re-enqueued similarly to what we do for the polling mechanism of service related operations. The job could then check whether the resources have been deleted successfully and if so delete the parent resource.

Some thoughts on this:

  • Probably we would need a "locking mechanism" to prevent that in an org or space etc., which are being deleted, new resources are being created.
  • If asynchronous deletions fail, the job should remember that it tried to delete this resource already, otherwise this might be an endless loop.
  • A parameter, which allows configuring a maximum timeout for such jobs would be good
  • When the job fails because sub-resources could not be deleted, it would be good to show the original error message, why the deletion failed.

Delete parent resource immediately and continue asynchronous deletion of sub-resources in the background

If a service broker responds with a 202 for an unbind or deprovision request we can assume that the broker will take care of the deletion and at least "delete" it from the user perspective. The CC could then continue polling the last operation state from the broker. If the deletion fails, orphan mitigation could take over.

Some thoughts on this:

  • In the worst case the deletion of the service binding times out after the max_poll_intervall. How to proceed with the service instance then?
  • What if a user wants to create resources with the same names again after the CC stated they have been deleted, but in reality the deletion is still going on in the background?

Related issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant