-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix large exports failing #5388
Conversation
5570469
to
9d8db20
Compare
This is a still in progress work. It does not account for cases with other storage backends. And the solution for the string operation of the artifact paths is still hackish. However it did successfully export a large RPM content view with more than 65k packages which previously failed. |
c64dace
to
57cd9d4
Compare
4d03dc0
to
ce0c796
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable!
We need some more extensive import/export testing, it's just difficult to set up.
Let's squash the commits and get one additional review
Head branch was pushed to by a user without write access
pulpcore/app/importexport.py
Outdated
batch = artifact_pks[offset : offset + EXPORT_BATCH_SIZE] | ||
batch_qs = Artifact.objects.filter(pk__in=batch).only("file") | ||
|
||
for artifact in pb.iter(batch_qs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use iterator() here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this? Not sure what difference it makes.
I did re-test the large export with the extra .iterator()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading that, .iterator()
sounds like the right choice for batch query sets. Thanks for the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it actually mattered in this instance as the default batch size is 2000, and we're already doing external batching... with a batch size of 2000
But it doesn't hurt anything so no need to revert.
closes pulp#5375 Co-authored-by: Tobias Grigo <56518487+hstct@users.noreply.github.com>
Backport to 3.39: 💚 backport PR created✅ Backport PR branch: Backported as #5416 🤖 @patchback |
Backport to 3.49: 💚 backport PR created✅ Backport PR branch: Backported as #5418 🤖 @patchback |
closes #5375