Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notices: Cancel notice waiting on terminating the daemon #13961

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sergio-costas
Copy link
Contributor

Currently, if there is a program using the notices API and awaiting for notices, and the snapd daemon is killed (for example, because the snapd snap is being updated), the blocked socket keeps the termination process locked until a 25 seconds timeout fires. This delays for 25 seconds the termination of the daemon.

This patch detects when a SIGTERM or a SIGKILL signal is received, and breaks the waiting in the notices thread, thus ensuring that the termination process goes as quick as possible.

Thanks for helping us make a better snapd!
Have you signed the license agreement and read the contribution guide?

@sergio-costas sergio-costas changed the title Cancel notice waiting on terminating the daemon notices: Cancel notice waiting on terminating the daemon May 10, 2024
Currently, if there is a program using the notices API and
awaiting for notices, and the snapd daemon is killed (for
example, because the snapd snap is being updated), the blocked
socket keeps the termination process locked until a 25 seconds
timeout fires. This delays for 25 seconds the termination of the
daemon.

This patch detects when a SIGTERM or a SIGKILL signal is
received, and breaks the waiting in the notices thread, thus
ensuring that the termination process goes as quick as possible.
@sergio-costas sergio-costas force-pushed the close-notifications-connection-on-exit branch from a162480 to b9b4025 Compare May 10, 2024 12:39
@ZeyadYasser ZeyadYasser self-requested a review June 6, 2024 08:13
@pedronis pedronis requested review from bboozzoo and olivercalder and removed request for ZeyadYasser June 6, 2024 08:13
@@ -439,10 +451,37 @@ func (s *State) WaitNotices(ctx context.Context, filter *NoticeFilter) ([]*Notic
})
defer stop()

chKill := make(chan os.Signal, 2)
signal.Notify(chKill, syscall.SIGINT, syscall.SIGTERM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, the responsibility for signal handling becomes a part of some random unrelated code.
We have some signal handling code in main which in turn requests daemon Stop() where appropriate. I think we could use this opportunity to refactor that code and ensure that we can have properly propagate cancellation when needed.

For this I would suggest the following:

  • modify daemon.go, daemon.Start() should take a context
  • the context should be created in snapd main(), possibly through signal.NotifyContext()
  • daemon.Start(), should set the http.Server.BaseContext to a function which returns said context, AFAIU it should then be used as a base for all request contexts, so a cancellation from within main() will cancel the incoming requests

Let me know if this is something you can work on, or would should rather someone form the snapd team take this up as a task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: the "signal.Notify" method specifically says that you can connect as many channels as you want, and all will be notified when the signal is emitted. That's why I did the code this way. In fact, both SIGINT and SIGTERM are connected in other parts of the code too, and all receive the notifications.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean that we should build a predictable context cancellation chain such that we can reliably cancel requests/waiting code by calling cancel from the topmost parent context, which in this case would be provided by the daemon (but could also be coming from tests or other places).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... yes, I see your point. But I'm afraid that it is too big for my knowledge of the snapd architecture... 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants