Add migrate endpoint to move artifacts to another storage backend #4298

gerrod3 · 2023-08-17T23:40:08Z

mdellweg · 2023-08-21T10:09:15Z

pulpcore/app/response.py

-            kwargs["pulp_domain"] = request.pulp_domain.name
+            kwargs["pulp_domain"] = task.pulp_domain.name


This is not a semantic change, right? Should we maybe always try to use get_domain instead?

request.domain == get_domain(). This change is for when setting the task in a separate domain from the calling domain. In this case when calling migrate it is called (can be called) from the default domain since domains are a part of the default domain.

But should you not call migrate from within the affected domain?
The fact that domains are in the default domain is a bit confusing i guess.

Yeah this is a question I haven't figured out yet and that is where to put the new endpoint. Currently it is on the domains endpoint, specifically on an instance of a domain (/pulp/default/api/v3/domains/domain_href/migrate/), but it could equally just be a new standalone endpoint that figures out what to do based on the domain it ran in. (pulp/foo-domain/api/v3/migrate/). Not sure what approach is the best, any suggestions?

Self referential stuff is always weird...
would POST /pulp/foo-domain/api/v3/domain/migrate/ work? Where /pulp/foo-domain/api/v3/domain/ is a self inspecting domain detail viewset?

Yes, that would work as well. Would you like for me to try it out?

I'd prefer that.

bmbouter · 2023-08-22T14:28:11Z

pulpcore/app/viewsets/domain.py

+    @action(detail=True, methods=["patch"])
+    def migrate(self, request, **kwargs):
+        """
+        Migrate the domain's storage backend to a new one.


Do we need to make the migrating of data and the updating of storage credentials more atomic?

Here's the scenario I worry about:

user wants to migrate and starts a /migration/ request

user still has content-change operations submitted, e.g. a cron-dispatched sync which queues and waits while content is migrating

the content migration finished and the subseuent content-change task begins to run

because the credentials aren't updated that new content is saved in the old backend once again

Before thinking over ideas on how to fix this, am I understanding this PR right that this could happen? If so, what do you all think about this?

Only on the default domain can your scenario happen, which is why I explicitly instruct users to update their config file to the new backend and restart Pulp after the migration (hopefully users of domains won't be migrating the default domain very often). The migration task locks the domain so no other task can run, and for all other non-default domains the task will also update that domain's storage settings to the new backend. Therefore any queued up task will run with the new backend after the migration.

Note, the default domain is special since it loads its values for storage from the config file which with dynaconf means they could be stored anywhere, so it was impractical to update these values automatically.

I didn't realize they update the configs after migration. My concern is a not an issue. Regarding the default domain I don't think we can do any better either. Thanks for explaining this.

Should we let the default domain configure it's storage backend in the database too then?
Or what about this: We disallow to use this endpoint on the default domain, and make this migration for the default domain a management command that checks if services are shut down? And on startup, all services check that the storage settings are matching the db?

Since default domain is special and essentially requires to undergo a manual switch-over, my recommendation would be to document the necessary steps (two?) and add a warning note that will make it clear that until the necessary steps are not performed all content, including new one will be saved/served from the old backend.
And that another migrate task might be needed in case these steps were omitted to bring over the newly created diff of content.

@mdellweg you have a valid point

Ok, I've tried to reach a compromise between having the default domain update immediately upon migration, but still allow for its settings to be determined by the config file. Here is the new logic:

All domains have their storage instance built using their field storage_settings. For the default domain this field is empty and thus will use the default values from settings.

Upon migration the domain's storage_settings is updated to the new values. This way all domains, including the default domain, will now properly use the new storage backend after the migration.

If migrating the default domain, a message is put into the migration task to remind the user to update their config with the new storage settings before the next DB migration

Upon pulpcore-manager migrate, Pulp in a post-migrate hook will update the default domain's saved fields to match the values (storage_class, redirect_to_object_storage, hide_distributions) to the config file. storage_settings will be reset back to empty to ensure that the default storage is again built from settings.

Sorry for keeping you busy with this one. But i think we should try to find a bullet proof way.
If i understand your compromise correctly, the admin will run the migration, whereby the default domain will change it's config. Now they miss reading the message and upon running pulpcore-manager migrate the system will be borked again. Is this correct?

What if we follow an approach more similar to the allowed checksums setting?

You adjust the config.

You restart the services (they may warn you about the mismatch).

You kick off the migrate task for the default domain without any parameters (they come from the settings).
3.1 If the settings fail to validate, it will error out in the view.

Can you explain more of what you mean by 3 and 3.1? I am not following your logic.

EDIT

Now they miss reading the message and upon running pulpcore-manager migrate the system will be borked again. Is this correct?

Yes the system will be bricked until they update the config and run pulpcore-manager migrate. I've added a check into settings.py for mismatched settings and logs a warning to the user.

Yes the system will be bricked until they update the config and run pulpcore-manager migrate. I've added a check into settings.py for mismatched settings and logs a warning to the user.

But by the time this message is shown (and may even be missed by the admin), Pulp is reconfigured to use this old (brobably already outdated) storage, and with the first new artifact there, you end up with two in consistent storages. That is the situation i try to avoid.

Line of thinking: How much less special can we make the default domain? What if the default domain, like any other, just always carries it's storage definition? So when you want to change that, you modify the settings, restart your services and tell pulp to rectify the situation. That will be your migrate task, but it should take it's parameters from the config now, instead of the request body (Or we make it fail when they differ...). They get validated before the task is dispatched (that was 3.1). Not at all times, the data base will be the truth, and the settings are the desired state the database is supposed to match eventually.

pulpcore/app/tasks/migrate.py

ipanova · 2023-08-23T12:08:43Z

pulpcore/app/viewsets/domain.py

+    @action(detail=True, methods=["patch"])
+    def migrate(self, request, **kwargs):
+        """
+        Migrate the domain's storage backend to a new one.


Since default domain is special and essentially requires to undergo a manual switch-over, my recommendation would be to document the necessary steps (two?) and add a warning note that will make it clear that until the necessary steps are not performed all content, including new one will be saved/served from the old backend.
And that another migrate task might be needed in case these steps were omitted to bring over the newly created diff of content.

jlsherrill · 2023-08-23T12:23:42Z

pulpcore/app/tasks/migrate.py

+            domain.storage_class = data["storage_class"]
+            domain.storage_settings = data["storage_settings"]
+            domain.save(update_fields=["storage_class", "storage_settings"])
+            pb.increment()


How errors are handled is one of my biggest concerns with this feature. If i'm reading this right:

If an error occurs right away (i.e. new storage settings are wrong), the error gets thrown and nothing is updated

if an error occurs half way through the migration (or lets say it gets cancelled.) Some of the artifacts would be migrated to the new storage but some would not, and the domain would not be updated to the storage correct? And then a user would need to migrate again to finish migrating all the rest of the items, or else specify a new 3rd configuration, where by everything would be migrated to the third storage.

Does that sound Accurate?

Yep, that's correct for both scenarios. I'll add a check for if the artifact is missing on the current backend, and tell the user to run the repair command or delete the artifact if so.

jlsherrill · 2023-08-23T12:24:41Z

pulpcore/app/tasks/migrate.py

+                date = now()
+                continue
+            break
+


what happens to the files stored in the old storage? Are they just left behind?

Correct, I don't attempt to delete them. The user can decide what to do with their old backend.

Do we document that?
Also I think we eventually need to provide tools for this. I'd be ok to leave that for later...

It's documented in the endpoint's method comment, which appears in the auto-gen docs. I don't think there is a need for a tool to delete files from the storage. We already entrust users to bring their own storage, so I think they can manage their files on it, but I agree if we do decide we need a tool it would be best to add it later.

mdellweg · 2023-08-31T08:54:19Z

pulpcore/app/tasks/migrate.py

+    artifacts = Artifact.objects.filter(pulp_domain=domain)
+    date = now()
+
+    with ProgressReport(


Now here's a new question coming to my mind. Are there settings that can change without it being a new bucket? Thinking about key-rotation here...
Is there any way we can identify "this is the same bucket, do not move the artifacts"? Or even "with the old settings you cannot reach the bucket anymore"?

Still need to think about this some more, but I do believe there is a way. I check to see if the artifact is present in the new location, but depending on the number of artifacts this could still be slow and doing unnecessary work.

stale · 2023-12-17T20:49:08Z

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

stale · 2024-04-09T22:06:24Z

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

bmbouter · 2024-04-10T13:29:00Z

I met with @jlsherrill yesterday and he brought this up as a feature their usage will need this year. I'm hoping we can release it upstream and test it some there before putting it into use by @jlsherrill 's team

@gerrod3 and @goosemania could you give some thought to when this could be continued? I'm not sure what else you have going on so I'm just starting the convo here.

stale · 2024-04-24T14:08:23Z

This pull request is no longer marked for closure.

gerrod3 · 2024-04-24T14:55:02Z

Ok, I've updated the PR to make the default domain less special. The storage settings for the default domain will now also be saved in the database. This will be accomplished through a data migration (should be ZDU safe). When migrating the default domain the endpoint will use the values from the config file so that settings will still be the final source of truth for the default domain. This will allow users to change the values in their settings file and then simply run the migrate endpoint to move their artifacts to the new backend, all without restarting Pulp.

I'm still unsure of the best way to inform users if there is a disparity between settings and the default domain. Right now I have a check in settings.py file, but it doesn't feel like the right place.

fixes: pulp#3358

mdellweg

Not a full review...
Did not look at the tests yet.

mdellweg · 2024-05-16T13:07:10Z

pulpcore/app/migrations/0119_store_default_domain_settings.py

+def default_domain_settings(apps, schema_editor):
+    Domain = apps.get_model("core", "Domain")
+    default_domain = Domain.objects.get(name="default")
+    config_settings = StorageSettingsSerializer.get_default_domain_settings(settings)


This is a bit dangerous. Maybe it's better at this point do keep a frozen copy of the serializer.

mdellweg · 2024-05-16T13:10:30Z

pulpcore/app/serializers/domain.py

-    object_parameters = serializers.DictField(default={})
+    object_parameters = serializers.DictField(default=dict())


This isn't really a change. Did you intend to change to default=dict to make it use a callable?

mdellweg · 2024-05-16T13:14:54Z

pulpcore/app/settings.py

+            )
+            row = cursor.fetchone()
+            mismatched = []
+            for i, setting in enumerate(


would

for k, v in zip((DEFAULT_FILE_STORAGE, ...), row):

work?

gerrod3 force-pushed the migrate-backend branch 4 times, most recently from 6c09731 to 49669c1 Compare August 18, 2023 03:09

mdellweg reviewed Aug 21, 2023

View reviewed changes

bmbouter reviewed Aug 22, 2023

View reviewed changes

ipanova reviewed Aug 23, 2023

View reviewed changes

jlsherrill reviewed Aug 23, 2023

View reviewed changes

gerrod3 force-pushed the migrate-backend branch 6 times, most recently from 9cf4416 to 53519ea Compare August 28, 2023 18:28

gerrod3 force-pushed the migrate-backend branch from 53519ea to 2553416 Compare August 30, 2023 20:57

mdellweg reviewed Aug 31, 2023

View reviewed changes

gerrod3 marked this pull request as draft September 18, 2023 19:29

stale bot added the stale label Dec 17, 2023

gerrod3 removed the stale label Jan 4, 2024

stale bot added the stale label Apr 9, 2024

gerrod3 force-pushed the migrate-backend branch from 2553416 to 6fc7730 Compare April 24, 2024 14:08

stale bot removed the stale label Apr 24, 2024

gerrod3 force-pushed the migrate-backend branch 2 times, most recently from d2347ad to 651737e Compare April 24, 2024 14:17

gerrod3 force-pushed the migrate-backend branch from 651737e to 1745fcd Compare April 24, 2024 14:56

gerrod3 force-pushed the migrate-backend branch 3 times, most recently from d96ec2e to aa42810 Compare May 14, 2024 14:12

gerrod3 marked this pull request as ready for review May 14, 2024 15:09

gerrod3 force-pushed the migrate-backend branch from aa42810 to aa9464b Compare May 16, 2024 03:50

Add migrate endpoint to move artifacts to another storage backend

e7f6d63

fixes: pulp#3358

gerrod3 force-pushed the migrate-backend branch from aa9464b to e7f6d63 Compare May 16, 2024 03:52

mdellweg reviewed May 16, 2024

View reviewed changes

		kwargs["pulp_domain"] = request.pulp_domain.name
		kwargs["pulp_domain"] = task.pulp_domain.name

		object_parameters = serializers.DictField(default={})
		object_parameters = serializers.DictField(default=dict())

Add migrate endpoint to move artifacts to another storage backend #4298

Are you sure you want to change the base?

Add migrate endpoint to move artifacts to another storage backend #4298

Conversation

gerrod3 commented Aug 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdellweg Aug 22, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Aug 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrod3 Aug 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrod3 Aug 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Aug 23, 2023 • edited

Choose a reason for hiding this comment

jlsherrill Aug 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Dec 17, 2023

stale bot commented Apr 9, 2024

bmbouter commented Apr 10, 2024

stale bot commented Apr 24, 2024

gerrod3 commented Apr 24, 2024

mdellweg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdellweg Aug 22, 2023 •

edited

ipanova Aug 23, 2023 •

edited

gerrod3 Aug 28, 2023 •

edited

gerrod3 Aug 30, 2023 •

edited

ipanova Aug 23, 2023 •

edited

jlsherrill Aug 23, 2023 •

edited