Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

o/hookstate/ctlcmd: queue service commands if run from default-configure hook #13960

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ZeyadYasser
Copy link
Collaborator

This PR queues snapctl services' commands to be run after the default-configure similar to what is done for configure hook #4070. This is to avoid a problem where the service doesn't see a new value if it uses snapctl get ..., because it's still not committed by default-configure hook similar to what was described in LP#2047949 .

Fixes: https://bugs.launchpad.net/snapd/+bug/2047949

@github-actions github-actions bot added the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label May 10, 2024
…re hook

Queue "snapctl restart ..." and "snapctl start ..." commands to be run after
default-configure similar to configure hook. This is to avoid a problem where
the service doesn't see a new value if it uses "snapctl get ...", because it's
still not commited by default-configure hook.

Fixes: https://bugs.launchpad.net/snapd/+bug/2047949

Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
@ZeyadYasser ZeyadYasser force-pushed the default-configure-hook-queue-services branch from 153e4fd to 757129b Compare May 10, 2024 12:22
Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Please make sure to link updated documentation on the snapcraft forum.

@ZeyadYasser
Copy link
Collaborator Author

Please make sure to link updated documentation on the snapcraft forum.

This fix just makes the implementation consistent with the documented behavior https://snapcraft.io/docs/supported-snap-hooks#heading--default-configure and the snapctl docs don't mention this behavior at all.

Do you have a specific doc page in mind that would need updating?

@ZeyadYasser ZeyadYasser marked this pull request as draft May 27, 2024 14:06
@ernestl
Copy link
Collaborator

ernestl commented May 27, 2024

It's not clear why the user still wants to start services from within the default configure hook. My understanding is the the reason for this was specific to the order of configure-hook VS starting services as part of the do-install flow which should not be required anymore unless I am missing something.

A secondary more subtle point, it does not seem like a good idea to start/interfere with service commands before do-install start service is done. Longer term we might want explicitly limit snapctl powers depending on how far the install sequence progressed to avoid creative permutations that cause problems.

@ernestl ernestl marked this pull request as ready for review May 27, 2024 15:07
@ernestl
Copy link
Collaborator

ernestl commented May 27, 2024

It's not clear why the user still wants to start services from within the default configure hook. My understanding is the the reason for this was specific to the order of configure-hook VS starting services as part of the do-install flow which should not be required anymore unless I am missing something.

A secondary more subtle point, it does not seem like a good idea to start/interfere with service commands before do-install start service is done. Longer term we might want explicitly limit snapctl powers depending on how far the install sequence progressed to avoid creative permutations that cause problems.

The reason why the user have service disabled still: They use the snap for core and for classic, and in classic there is no concept of seeding configuration with gadget. They rely on the user manually modifying the configuration and starting the service, so it is required to have it disabled by default.

@pedronis pedronis added the Needs Samuele review Needs a review from Samuele before it can land label May 27, 2024
@ernestl ernestl requested a review from pedronis May 27, 2024 18:29
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some question/comments

overlord/hookstate/ctlcmd/services_test.go Outdated Show resolved Hide resolved
@@ -149,7 +149,7 @@ func runServiceCommand(context *hookstate.Context, inst *servicestate.Instructio
return err
}

if !context.IsEphemeral() && context.HookName() == "configure" {
if !context.IsEphemeral() && (context.HookName() == "configure" || context.HookName() == "default-configure") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do the tasks queued after default-configure exactly end up? it's not very transparent from tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are queued at the end after all install tasks

for _, ts := range tts {
for _, t := range tasks {
// queue service command after all tasks, except for final tasks which must come after service commands
if finalTasks[t.Kind()] {
t.WaitAll(ts)
} else {
ts.WaitFor(t)
}
}
change.AddAll(ts)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added clarifying comments in the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean after the configure hook? does this solve the original issue people were having in ways they expect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after both the configure hook and the health-check hook.

For illustration this is the expected order of tasks for installing a snap which is used in the unit tests:

"prerequisites",
"download-snap",
"validate-snap",
"mount-snap",
"copy-snap-data",
"setup-profiles",
"link-snap",
"auto-connect",
"set-auto-aliases",
"setup-aliases",
"run-hook[install]",
"run-hook[default-configure]",
"start-snap-services",
"run-hook[configure]",
"run-hook[check-health]",
"exec-command",                      <----  task for snapctl command
"service-control",                   <----  task for service operation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we probably want to ask the reporter on the bugs if this is too late for them or ok?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this solve the original issue people were having in ways they expect?

Yes, I tested it locally using the snap they provided. I left a comment there but I am waiting for feedback.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we probably want to ask the reporter on the bugs if this is too late for them or ok?

Good point, I will check.

Copy link

@farshidtz farshidtz May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZeyadYasser for the fix and your explanations.

The use case is from a snap that has a one-shot service doing several firewall and network configuration to set up a Thread bridge (linked in the bug report). As mentioned in #13960 (comment), the services are disabled by default, so that the user (on classic but also a pre-built Core image) gets a chance to configure for e.g. the networking interface before starting the services. However, the same snap needs to start automatically on Core when the config is seeded from a gadget. (A workaround was suggested by @zyga to keep services enabled by default but not run them when the needed config isn't set; this could work well for a limited config set.)

In our case, with the workflow proposed in this PR, it could work as follows:

  1. default-configure hook calls snapctl start if the autostart config option is set to true.
  2. configure hook does some validation (not done in default-configure to avoid redundant checks).
  3. services start.

But I suppose this isn't valid in every use case, as some applications expect the configure hook to execute while the services are running.

In summary, while the proposed order isn't an issue for our use case, I believe starting the services right after the hook succeeds makes this more inline with the snap best practices.

Side note: I wonder why run-hook[check-health] is ordered between configure and exec-command/service-control tasks.

Another solution I can think of is to add the ability to just enable/disable services via snapctl without starting or stopping them. Then, the default-configure hook could simply call for e.g. the snapctl enable --service <service> (snapctl enable does something else) command (without needing to queue) and the following task (start-snap-services) should just take care of the startup. I've already discussed this with @ernestl, and I'm not sure if this is going to work. I have experimented by calling snapctl stop --disable <service> from install and default-configure hooks for an enabled service, and it works as expected: the service gets disabled and never starts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the earliest we can start those services is after snap-start-services, @ZeyadYasser could you look what would be the complexity of queuing commands from default-configure there?

And add comments explaining tasks relative order.

Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
@ZeyadYasser ZeyadYasser requested a review from pedronis May 29, 2024 11:58
Signed-off-by: Zeyad Gouda <zeyad.gouda@canonical.com>
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine with me assuming it solves the original problems as is

@pedronis pedronis requested review from pedronis and removed request for pedronis May 30, 2024 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation -auto- Label automatically added which indicates the change needs documentation Needs Samuele review Needs a review from Samuele before it can land
Projects
None yet
5 participants