Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory spike on repository delete #5049

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

dralley
Copy link
Contributor

@dralley dralley commented Feb 13, 2024

Django handles cascade deletes itself (as opposed to letting the database handle it) and in doing so it pulls a lot of data out, resulting in a memory spike on repository deletion.

The only solution is to write a custom cascade delete that deletes the objects at the bottom of the cascade tree more incrementally.

closes #2280

@@ -236,3 +236,24 @@ def add_and_remove(repository_pk, add_content_units, remove_content_units, base_
with repository.new_version(base_version=base_version) as new_version:
new_version.remove_content(models.Content.objects.filter(pk__in=remove_content_units))
new_version.add_content(models.Content.objects.filter(pk__in=add_content_units))


def custom_cascade_delete_task(instance_id, app_label, serializer_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review, just one thing i stumbled over:
This is a user visible task name. Should we call it generic_cascade_delete instead? And also maybe move it to pulpcore.app.tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in pulpcore/app/tasks/, do you mean a different file?

I see your point about it being user visible but there might be a bit of conflict between making it generic (for the user's benefit) and making it specific (for the developer's benefit). Do we actually want to use this for everything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is more "generic" than "custom". And I think i like this method better.

Copy link
Contributor Author

@dralley dralley Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as we have to have line 259 and 270 I think "generic" is the wrong word to use. If we can find some way to move that out, then sure.

Until then I should probably take the name custom_repository_cascade_delete_task from the first approach

I'm going to do some testing to see if the same effect can occur at a significant scale w/ deletion of a single publication or repository version, though, and if it does we probably should try to find some way to make it fully generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could still be just 'pulpcore.app.tasks.repository_delete'.

@dralley dralley force-pushed the mem-spike branch 4 times, most recently from ee80486 to 40b3f1d Compare February 28, 2024 03:25
LOG.info(
f"Level {level} Delete Cascade for {base_model.__name__}: "
"Checking relations for {from_model.__name__}"
)
Copy link
Contributor Author

@dralley dralley Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably convert these to DEBUG prior to merging, but for review purposes it's nice to see exactly what's going on. I can switch them prior to merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to debug

@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2024

OK, it seems I finally got this working. It required some tweaks beyond the original code, which did not it seems handle some edge cases that we trigger.

On the bright side, not only does this reduce the memory spike to a tiny blip, but it also seems to make the delete complete much faster.

@dralley dralley force-pushed the mem-spike branch 2 times, most recently from 229becd to 2dececf Compare February 28, 2024 04:01
@dralley dralley marked this pull request as ready for review February 28, 2024 04:36
@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2024

There's one catch, which is why the CI is failing

If you look at the actual implementation of the invalidate_cache hook which runs before deletion of the repository, it tries to hunt down any publications that are currently distributed.

https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/repository.py#L373

The problem is that the way this implementation works specifically deletes all of those child objects before we get around to triggering the delete on the parent object and therefore the invalidation hook. So it finds nothing and doesn't invalidate the cache.

That can be fixed, but it will probably require a carveout for this specific edge case. If we're going down that road anyway, is it worth even going down this road in the first place, as opposed to just writing a manual cascade specific to Repository? Originally we wanted to make this implementation as generic as possible, but we might not be able to completely acheive that.

I have mixed feelings. Of course it would be nice to be able to reuse this anywhere but I'm not personally sold on the idea that this code itself doesn't have a mistake or won't eventually break against some new Django version. A manually written cascade would be much simpler and more straightforwards.

@ggainey
Copy link
Contributor

ggainey commented Feb 28, 2024

The reason I don't like the "write manual code" is that the list of things that need to be deleted, is under plugin control - so code that works "today", might fail to delete things tomorrow.

Regarding cache-invalidation - shouldn't "invalidate the cache for Distribution X" be handled by the Distribution? Why does the repo-delete do it, instead of Distribution-delete?

@gerrod3
Copy link
Contributor

gerrod3 commented Feb 28, 2024

Regarding cache-invalidation - shouldn't "invalidate the cache for Distribution X" be handled by the Distribution? Why does the repo-delete do it, instead of Distribution-delete?

Because how would a distribution know if the repository it is serving was just updated/deleted? Deleting a repository doesn't delete the distribution serving it. Deleting a repository does delete the publication serving it.

@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2024

The reason I don't like the "write manual code" is that the list of things that need to be deleted, is under plugin control - so code that works "today", might fail to delete things tomorrow.

Well in this case the way I'd write it (as in this PR actually) is that we'd still be using the Django delete on the top level object, we'd just delete various bits and pieces first so that the cascade won't need to pick them up. So you can't fail to delete anything, only fail to achieve quite as much efficiency in deletion as desired if you miss anything significant.

Regarding cache-invalidation - shouldn't "invalidate the cache for Distribution X" be handled by the Distribution? Why does the repo-delete do it, instead of Distribution-delete?

Well, that happens too https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/publication.py#L703-L720

But to answer your question, the repository field on BaseDistribution is set to on_delete=SET_NULL rather than on_delete=PROTECT. Maybe that's a bad decision but that's how it currently is. So deleting a repository leaves the distribution intact but invalid in its present state (for any plugin using repositories directly)

@ggainey
Copy link
Contributor

ggainey commented Feb 28, 2024

OK, I see now - repo-delete, does delete the Publications, but does not delete the Distributions because they're set-null on the Repository-FK. Looking at the distribution-code, I'd still expect the cache-invalidation to have happened as a result of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/publication.py#L711 though.

Note: Distributions are allowed to exist without-a-repo as a result of an actual user request/requirement - "I want to create all my Distributions, and have them returning 404s, before the repos they are meant to serve are sync'd/published". Odd, but it's this way for a reason.

@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2024

Looking at the distribution-code, I'd still expect the cache-invalidation to have happened as a result of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/publication.py#L711 though.

Apparently not given that the CI fails w/o a cache invalidation that's supposed to happen. Nice test @gerrod3 btw

@ggainey
Copy link
Contributor

ggainey commented Feb 28, 2024

Looking at the distribution-code, I'd still expect the cache-invalidation to have happened as a result of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/publication.py#L711 though.

Apparently not given that the CI fails w/o a cache invalidation that's supposed to happen. Nice test @gerrod3 btw

Since we're updating distribution.repo to None, directly in SQL, I "assume" that's why the distribution's update-hook isn't being called and doing the invalidaiton. Bah.

Django handles cascade deletes itself (as opposed to letting the
database handle it) and in doing so it pulls a lot of data out,
resulting in a memory spike on repository deletion.

The only solution is to write a custom cascade delete that deletes the
objects at the bottom of the cascade tree more incrementally.

closes pulp#5048
@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2024

@gerrod3 @mdellweg @ggainey Well, I've tested both approaches (and pushed both commits). Let me know how you feel about them and which we might prefer.

The fully manual approach is actually a bit trickier than I thought as well, for what it's worth. But both are ultimately viable.


if not model_relation.on_delete:
pass
elif model_relation.on_delete.__name__ == "SET_NULL":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to look for PROTECTED relations first and bail out fast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should look for PROTECTED relations, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also RESTRICT. As it stands, we do clean those up to, but I think only because it happens to be at the very top level, and because of line 152

pulp [None]: pulpcore.tasking._util:INFO: Writing task memory data to /var/tmp/pulp/018e0b90-d4ea-7d36-9a21-d4ca99fa1f68/memory.datum
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.tasking.tasks:INFO: Starting task 018e0b90-d4ea-7d36-9a21-d4ca99fa1f68
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 0 Delete Cascade for RpmRepository: Checking relations for RpmRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.acs.AlternateContentSourcePath'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of AlternateContentSourcePath
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1 Delete Cascade for RpmRepository: Checking relations for AlternateContentSourcePath
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1: delete records from AlternateContentSourcePath
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from AlternateContentSourcePath
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RepositoryContent
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1 Delete Cascade for RpmRepository: Checking relations for RepositoryContent
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1: delete records from RepositoryContent
[pulp]  | ('pulp [be07879d2708488ea6cde1ea189f18aa]: ::ffff:127.0.0.1 - admin [04/Mar/2024:22:23:45 +0000] "GET /pulp/api/v3/tasks/018e0b90-d4ea-7d36-9a21-d4ca99fa1f68/ HTTP/1.0" 200 799 "-" "Pulp-CLI/0.23.2"',)
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 167812 records from RepositoryContent
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryVersion'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1 Delete Cascade for RpmRepository: Checking relations for RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function RESTRICT at 0x7f648a33f160>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function RESTRICT at 0x7f648a33f160>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryVersion'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on RepositoryVersion relation of RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 0 records in RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryVersionContentDetails'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RepositoryVersionContentDetails
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 2 Delete Cascade for RpmRepository: Checking relations for RepositoryVersionContentDetails
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 2: delete records from RepositoryVersionContentDetails
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 67 records from RepositoryVersionContentDetails
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Publication'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of Publication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 2 Delete Cascade for RpmRepository: Checking relations for Publication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 452480 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedMetadata'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3 Delete Cascade for RpmRepository: Checking relations for PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.ContentArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4 Delete Cascade for RpmRepository: Checking relations for ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.RemoteArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5 Delete Cascade for RpmRepository: Checking relations for RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5: delete records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4: delete records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 27 records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.Repository'>: on_delete=None
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function PROTECT at 0x7f648a33f0d0>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3: delete records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 27 records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Distribution'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on Distribution relation of Publication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 0 records in Distribution
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulp_rpm.app.models.repository.RpmPublication'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RpmPublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3 Delete Cascade for RpmRepository: Checking relations for RpmPublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedMetadata'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4 Delete Cascade for RpmRepository: Checking relations for PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.ContentArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5 Delete Cascade for RpmRepository: Checking relations for ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.RemoteArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6 Delete Cascade for RpmRepository: Checking relations for RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6: delete records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5: delete records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.Repository'>: on_delete=None
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function PROTECT at 0x7f648a33f0d0>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4: delete records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Distribution'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on Distribution relation of RpmPublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 0 records in Distribution
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3: delete records from RpmPublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 4 records from RpmPublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulp_file.app.models.FilePublication'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of FilePublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3 Delete Cascade for RpmRepository: Checking relations for FilePublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedMetadata'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4 Delete Cascade for RpmRepository: Checking relations for PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.ContentArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5 Delete Cascade for RpmRepository: Checking relations for ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.content.RemoteArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6 Delete Cascade for RpmRepository: Checking relations for RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6: delete records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from RemoteArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.PublishedArtifact'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6 Delete Cascade for RpmRepository: Checking relations for PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 6: delete records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 5: delete records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from ContentArtifact
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.Repository'>: on_delete=None
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.repository.RepositoryContent'>: on_delete=<function PROTECT at 0x7f648a33f0d0>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 4: delete records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PublishedMetadata
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Distribution'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on Distribution relation of FilePublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 0 records in Distribution
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 3: delete records from FilePublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from FilePublication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 2: delete records from Publication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 4 records from Publication
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Distribution'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on Distribution relation of RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 0 records in Distribution
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1: delete records from RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 5 records from RepositoryVersion
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.exporter.PulpExporter'>: on_delete=None
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.importer.PulpImporterRepository'>: on_delete=<function CASCADE at 0x7f648a3c3c10>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Cascading delete to relations of PulpImporterRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1 Delete Cascade for RpmRepository: Checking relations for PulpImporterRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 1: delete records from PulpImporterRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted 0 records from PulpImporterRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulpcore.app.models.publication.Distribution'>: on_delete=<function SET_NULL at 0x7f648a33f280>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Executing SET NULL constraint action on Distribution relation of RpmRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO:     Updated 1 records in Distribution
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulp_rpm.app.models.distribution.Addon'>: on_delete=<function RESTRICT at 0x7f648a33f160>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Relation from <class 'pulp_rpm.app.models.repository.RpmRepository'> to <class 'pulp_rpm.app.models.distribution.Variant'>: on_delete=<function RESTRICT at 0x7f648a33f160>
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Level 0: delete records from RpmRepository
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.app.django_util:INFO: Deleted (3, {'rpm.RpmRepository': 1, 'core.Repository': 1, 'core.UserRole': 1})
pulp [be07879d2708488ea6cde1ea189f18aa]: pulpcore.tasking.tasks:INFO: Task completed 018e0b90-d4ea-7d36-9a21-d4ca99fa1f68

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that we could easily and fast determine that we would not delete a single thing before even starting the rest of the process. But maybe it's not worth the trouble.

@@ -236,3 +236,24 @@ def add_and_remove(repository_pk, add_content_units, remove_content_units, base_
with repository.new_version(base_version=base_version) as new_version:
new_version.remove_content(models.Content.objects.filter(pk__in=remove_content_units))
new_version.add_content(models.Content.objects.filter(pk__in=add_content_units))


def custom_cascade_delete_task(instance_id, app_label, serializer_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is more "generic" than "custom". And I think i like this method better.

LOG.info(
f"Level {level} Delete Cascade for {base_model.__name__}: "
"Checking relations for {from_model.__name__}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to debug

pulpcore/app/django_util.py Show resolved Hide resolved
Django handles cascade deletes itself (as opposed to letting the
database handle it) and in doing so it pulls a lot of data out,
resulting in a memory spike on repository deletion.

The only solution is to write a custom cascade delete that deletes the
objects at the bottom of the cascade tree more incrementally.

[noissue]
# rather than letting it be triggered as we delete the repository
try:
instance.invalidate_cache()
except AttributeError:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK with this?

  • Should we rename invalidate_cache() so as to eliminate any possibility of name collision
  • Should we list specific models (e.g. Repository) that we know have this method and call it only on those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -236,3 +236,40 @@ def add_and_remove(repository_pk, add_content_units, remove_content_units, base_
with repository.new_version(base_version=base_version) as new_version:
new_version.remove_content(models.Content.objects.filter(pk__in=remove_content_units))
new_version.add_content(models.Content.objects.filter(pk__in=add_content_units))


def generic_cascade_delete_task(instance_id, app_label, serializer_name):
Copy link
Contributor Author

@dralley dralley Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdellweg This is fine yeah?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the most generic now. 👍

related_pk_values,
base_model=base_model,
level=level + 1,
skip_relations=skip_relations,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using "skip_relations" anywhere? Does this translate nicely to the cascaded model?
Should we maybe pass it down like version__content on Repository --> content on RepositoryVersion?

app_label = instance._meta.app_label

task = dispatch(
tasks.repository.generic_cascade_delete_task,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move it to task.base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider toggling the definition behind a setting so that, should any problems be uncovered, there's an easy way to revert to the previous behavior without reverting or waiting for a new release?

@dralley dralley marked this pull request as draft March 12, 2024 00:51
@dralley
Copy link
Contributor Author

dralley commented May 14, 2024

I need to pick this back up and get it over the finish line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants