New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add migrate endpoint to move artifacts to another storage backend #4298
base: main
Are you sure you want to change the base?
Conversation
6c09731
to
49669c1
Compare
pulpcore/app/response.py
Outdated
kwargs["pulp_domain"] = request.pulp_domain.name | ||
kwargs["pulp_domain"] = task.pulp_domain.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a semantic change, right? Should we maybe always try to use get_domain
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request.domain == get_domain()
. This change is for when setting the task in a separate domain from the calling domain. In this case when calling migrate it is called (can be called) from the default domain since domains are a part of the default domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should you not call migrate from within the affected domain?
The fact that domains are in the default domain is a bit confusing i guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a question I haven't figured out yet and that is where to put the new endpoint. Currently it is on the domains endpoint, specifically on an instance of a domain (/pulp/default/api/v3/domains/domain_href/migrate/
), but it could equally just be a new standalone endpoint that figures out what to do based on the domain it ran in. (pulp/foo-domain/api/v3/migrate/
). Not sure what approach is the best, any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self referential stuff is always weird...
would POST /pulp/foo-domain/api/v3/domain/migrate/
work? Where /pulp/foo-domain/api/v3/domain/
is a self inspecting domain detail viewset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would work as well. Would you like for me to try it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that.
@action(detail=True, methods=["patch"]) | ||
def migrate(self, request, **kwargs): | ||
""" | ||
Migrate the domain's storage backend to a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make the migrating of data and the updating of storage credentials more atomic?
Here's the scenario I worry about:
- user wants to migrate and starts a /migration/ request
- user still has content-change operations submitted, e.g. a cron-dispatched sync which queues and waits while content is migrating
- the content migration finished and the subseuent content-change task begins to run
- because the credentials aren't updated that new content is saved in the old backend once again
Before thinking over ideas on how to fix this, am I understanding this PR right that this could happen? If so, what do you all think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only on the default domain can your scenario happen, which is why I explicitly instruct users to update their config file to the new backend and restart Pulp after the migration (hopefully users of domains won't be migrating the default domain very often). The migration task locks the domain so no other task can run, and for all other non-default domains the task will also update that domain's storage settings to the new backend. Therefore any queued up task will run with the new backend after the migration.
Note, the default domain is special since it loads its values for storage from the config file which with dynaconf
means they could be stored anywhere, so it was impractical to update these values automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize they update the configs after migration. My concern is a not an issue. Regarding the default domain I don't think we can do any better either. Thanks for explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we let the default domain configure it's storage backend in the database too then?
Or what about this: We disallow to use this endpoint on the default domain, and make this migration for the default domain a management command that checks if services are shut down? And on startup, all services check that the storage settings are matching the db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since default domain is special and essentially requires to undergo a manual switch-over, my recommendation would be to document the necessary steps (two?) and add a warning note that will make it clear that until the necessary steps are not performed all content, including new one will be saved/served from the old backend.
And that another migrate task might be needed in case these steps were omitted to bring over the newly created diff of content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdellweg you have a valid point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've tried to reach a compromise between having the default domain update immediately upon migration, but still allow for its settings to be determined by the config file. Here is the new logic:
- All domains have their storage instance built using their field
storage_settings
. For the default domain this field is empty and thus will use the default values from settings. - Upon migration the domain's
storage_settings
is updated to the new values. This way all domains, including the default domain, will now properly use the new storage backend after the migration. - If migrating the default domain, a message is put into the migration task to remind the user to update their config with the new storage settings before the next DB migration
- Upon
pulpcore-manager migrate
, Pulp in a post-migrate hook will update the default domain's saved fields to match the values (storage_class
,redirect_to_object_storage
,hide_distributions
) to the config file.storage_settings
will be reset back to empty to ensure that the default storage is again built from settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for keeping you busy with this one. But i think we should try to find a bullet proof way.
If i understand your compromise correctly, the admin will run the migration, whereby the default domain will change it's config. Now they miss reading the message and upon running pulpcore-manager migrate
the system will be borked again. Is this correct?
What if we follow an approach more similar to the allowed checksums setting?
- You adjust the config.
- You restart the services (they may warn you about the mismatch).
- You kick off the migrate task for the default domain without any parameters (they come from the settings).
3.1 If the settings fail to validate, it will error out in the view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more of what you mean by 3 and 3.1? I am not following your logic.
EDIT
Now they miss reading the message and upon running
pulpcore-manager migrate
the system will be borked again. Is this correct?
Yes the system will be bricked until they update the config and run pulpcore-manager migrate
. I've added a check into settings.py
for mismatched settings and logs a warning to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the system will be bricked until they update the config and run
pulpcore-manager migrate
. I've added a check intosettings.py
for mismatched settings and logs a warning to the user.
But by the time this message is shown (and may even be missed by the admin), Pulp is reconfigured to use this old (brobably already outdated) storage, and with the first new artifact there, you end up with two in consistent storages. That is the situation i try to avoid.
Line of thinking: How much less special can we make the default domain? What if the default domain, like any other, just always carries it's storage definition? So when you want to change that, you modify the settings, restart your services and tell pulp to rectify the situation. That will be your migrate task, but it should take it's parameters from the config now, instead of the request body (Or we make it fail when they differ...). They get validated before the task is dispatched (that was 3.1). Not at all times, the data base will be the truth, and the settings are the desired state the database is supposed to match eventually.
@action(detail=True, methods=["patch"]) | ||
def migrate(self, request, **kwargs): | ||
""" | ||
Migrate the domain's storage backend to a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since default domain is special and essentially requires to undergo a manual switch-over, my recommendation would be to document the necessary steps (two?) and add a warning note that will make it clear that until the necessary steps are not performed all content, including new one will be saved/served from the old backend.
And that another migrate task might be needed in case these steps were omitted to bring over the newly created diff of content.
pulpcore/app/tasks/migrate.py
Outdated
domain.storage_class = data["storage_class"] | ||
domain.storage_settings = data["storage_settings"] | ||
domain.save(update_fields=["storage_class", "storage_settings"]) | ||
pb.increment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How errors are handled is one of my biggest concerns with this feature. If i'm reading this right:
- If an error occurs right away (i.e. new storage settings are wrong), the error gets thrown and nothing is updated
- if an error occurs half way through the migration (or lets say it gets cancelled.) Some of the artifacts would be migrated to the new storage but some would not, and the domain would not be updated to the storage correct? And then a user would need to migrate again to finish migrating all the rest of the items, or else specify a new 3rd configuration, where by everything would be migrated to the third storage.
Does that sound Accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's correct for both scenarios. I'll add a check for if the artifact is missing on the current backend, and tell the user to run the repair command or delete the artifact if so.
date = now() | ||
continue | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to the files stored in the old storage? Are they just left behind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I don't attempt to delete them. The user can decide what to do with their old backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we document that?
Also I think we eventually need to provide tools for this. I'd be ok to leave that for later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's documented in the endpoint's method comment, which appears in the auto-gen docs. I don't think there is a need for a tool to delete files from the storage. We already entrust users to bring their own storage, so I think they can manage their files on it, but I agree if we do decide we need a tool it would be best to add it later.
9cf4416
to
53519ea
Compare
53519ea
to
2553416
Compare
artifacts = Artifact.objects.filter(pulp_domain=domain) | ||
date = now() | ||
|
||
with ProgressReport( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now here's a new question coming to my mind. Are there settings that can change without it being a new bucket? Thinking about key-rotation here...
Is there any way we can identify "this is the same bucket, do not move the artifacts"? Or even "with the old settings you cannot reach the bucket anymore"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to think about this some more, but I do believe there is a way. I check to see if the artifact is present in the new location, but depending on the number of artifacts this could still be slow and doing unnecessary work.
This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution! |
This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution! |
I met with @jlsherrill yesterday and he brought this up as a feature their usage will need this year. I'm hoping we can release it upstream and test it some there before putting it into use by @jlsherrill 's team @gerrod3 and @goosemania could you give some thought to when this could be continued? I'm not sure what else you have going on so I'm just starting the convo here. |
2553416
to
6fc7730
Compare
This pull request is no longer marked for closure. |
d2347ad
to
651737e
Compare
Ok, I've updated the PR to make the default domain less special. The storage settings for the default domain will now also be saved in the database. This will be accomplished through a data migration (should be ZDU safe). When migrating the default domain the endpoint will use the values from the config file so that settings will still be the final source of truth for the default domain. This will allow users to change the values in their settings file and then simply run the migrate endpoint to move their artifacts to the new backend, all without restarting Pulp. I'm still unsure of the best way to inform users if there is a disparity between settings and the default domain. Right now I have a check in settings.py file, but it doesn't feel like the right place. |
651737e
to
1745fcd
Compare
d96ec2e
to
aa42810
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review...
Did not look at the tests yet.
def default_domain_settings(apps, schema_editor): | ||
Domain = apps.get_model("core", "Domain") | ||
default_domain = Domain.objects.get(name="default") | ||
config_settings = StorageSettingsSerializer.get_default_domain_settings(settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit dangerous. Maybe it's better at this point do keep a frozen copy of the serializer.
object_parameters = serializers.DictField(default={}) | ||
object_parameters = serializers.DictField(default=dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a change. Did you intend to change to default=dict
to make it use a callable?
) | ||
row = cursor.fetchone() | ||
mismatched = [] | ||
for i, setting in enumerate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would
for k, v in zip((DEFAULT_FILE_STORAGE, ...), row):
work?
fixes: #3358