Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version information to tasks #3930

Merged
merged 1 commit into from
Jul 12, 2023
Merged

Add version information to tasks #3930

merged 1 commit into from
Jul 12, 2023

Conversation

mdellweg
Copy link
Member

@mdellweg mdellweg commented Jun 16, 2023

Workers will refuse to pick up a task caliming a version of a plugin to be higher than their own. The corresponding plugin an it's version will be inferred from the task import path.

fixes #3369

Comment on lines 1 to 2
Added version information to tasks. Workers will refuse to pickup tasks with a version higher
than their own.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Added version information to tasks. Workers will refuse to pickup tasks with a version higher
than their own.
Added version information to tasks. Workers will refuse to pick up tasks with a version higher
than their own.

@mdellweg mdellweg force-pushed the zdu_tasks branch 2 times, most recently from 21bedfc to 9e296ce Compare July 10, 2023 15:08
@@ -42,19 +42,30 @@ def _dispatch_task(*args, **kwargs):
@pytest.fixture(scope="module")
def task(dispatch_task, monitor_task):
"""Fixture containing a Task."""
task_href = dispatch_task("time.sleep", args=(0,))
task_href = dispatch_task("pulpcore.app.tasks.test.sleep", args=(0,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly we are losing the ability to run outside python functions as tasks with this change. Are we sure that no plugin has been using this behavior already? (I think this is a better design and we should only allow running pulpcore/plugin code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, we never made any such statement. But i'd be happy to call it undefined behaviour. Back in the days celery tasks needed to be decorated, and i think that is not the worst design. Actually I would want to be even much stricter about what a task can be. But that is for another day. (sneak preview: #3997)

else:
function_name = func

versions = MODULE_PLUGIN_VERSIONS[function_name.split(".", maxsplit=1)[0]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to my previous comment, I feel this should be wrapped in a try/except block and raise a better error stating that the task function must live either inside pulpcore or a plugin!

@ggainey
Copy link
Contributor

ggainey commented Jul 11, 2023

Would it be desirable for the tasking-system to recognize that none of the available tasks can be picked up due to version-incompatibility and log warnings?

Is there any way to have a test-case that places an incompatible-task-entry into the queue and then sees it not-get-picked-up?

@mdellweg
Copy link
Member Author

Would it be desirable for the tasking-system to recognize that none of the available tasks can be picked up due to version-incompatibility and log warnings?

The distributed nature of the tasking system makes it kind of hard (without a lot of overhead including db queries) to identify that no available worker is compatible with the task at hand. Do we want that? This change is designed to help prevent bad to happen in a system that does rolling upgrades. So is suspect that the moment you spin up a new api worker you probable spin up a new worker too (maybe in the same container). And when you don't, you are in a half upgraded system and expect some quirks. It's not that the system is in an undefined state. The task will be picked up once at least one sufficiently recent worker is started.

Is there any way to have a test-case that places an incompatible-task-entry into the queue and then sees it not-get-picked-up?

Possibly. Not quite sure if we want that. The CI would probably not care that there will be a task waiting forever. Other development systems, idk.

@mdellweg mdellweg force-pushed the zdu_tasks branch 2 times, most recently from b930878 to 8b94b40 Compare July 12, 2023 10:09
@mdellweg
Copy link
Member Author

I added a test, and also changed the semantics a little bit. A worker is now still able to cancel a task even if the versions are not met. I believe this is necessary to be able to clean up e.g. when an upgrade needed rollback or a plugin was removed.

@gerrod3 you may want to take a second look.

Workers will refuse to pick up a task claiming a version of a plugin to
be higher than their own. The corresponding plugin and it's version will
be inferred from the task import path.

fixes pulp#3369
or parse_version(self.versions[label]) < parse_version(version)
]
if unmatched_versions:
_logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a _logger.warn here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it warn first but wasn't sure. In case of an ongoing rolling upgrade this can get quite verbose. I was even thinking about debug.
We have conflicting suggestions here:
@ggainey > "I'd make it INFO. We def want to leave traces - esp if/when the task-queue is full of nothing but New Tasks but the only workers running are running Olde Code"

@@ -220,6 +221,23 @@ def cancel_abandoned_task(self, task, final_state, reason=None):
self.notify_workers()
return True

def is_compatible(self, task):
unmatched_versions = [
f"task: {label}>={version} worker: {self.versions.get(label)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"task: {label}>={version} worker: {self.versions.get(label)}"
f"task: {label}>={version} worker: {self.versions.get(label, 'None')}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? My experiment (and it is part of the tests too) is that the stringinterpolation works just fine here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think this is needed after checking the string interpolation. f"{None}" == "None".

f"task: {label}>={version} worker: {self.versions.get(label)}"
for label, version in task.versions.items()
if label not in self.versions
or parse_version(self.versions[label]) < parse_version(version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this condition check. I think it should be !=. Maybe if plugin writers passed in the min required version for their tasks then we could use the this check, but if this is filled out by default we have no guarantee that a plugin's task is compatible with its previous version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'd basically be saying "you have to cancel all waiting tasks when you upgrade". A task can't declare future-compatibility, so by definition every time you update core or a plugin, existing tasks could never be handled.

Example: I have 200 waiting tasks issued by core/3.25. I upgrade to core/3.28. Using '!=' means now nothing can ever pick up those tasks.

zero-downtime-upgrade requires that new-task-code continue to be able to handle old-tasks; if we can't do that, we can't have ZDU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the rule is whenever a worker has a version greater or equal than the api dispatching the task, it needs to be able to handle it. You cannot assume there will be an old worker around to handle all the old tasks.
We have that rule already. Plugins must always be able to handle old tasks. This is not even ZDU related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well then, I guess this is a requirement I need to be cognizant of when reviewing plugin code as I was not aware of this before.

@mdellweg mdellweg merged commit b0743ef into pulp:main Jul 12, 2023
15 checks passed
@mdellweg mdellweg deleted the zdu_tasks branch July 12, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When running components in different versions, old workers may fail on new tasks
5 participants