Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upgrade an API-Platform/FrankenPHP/Mercure Docker Swarm service without downtime #898

Open
toby-griffiths opened this issue Apr 16, 2024 · 3 comments

Comments

@toby-griffiths
Copy link

I think that this is a Mercure issue, but please correct me if I'm wrong…

We have just deployed an API-Platform based project to a Docker Swarm and it's working nicely, however when we attempt to update the services, the first update attempt seems to always fail, with the following error appearing in the logs…

Error: loading initial config: loading new config: loading http app module: provision http: server srv0: setting up route handlers: route 0: loading handler modules: position 0: loading module 'subroute': provision http.handlers.subroute: setting up subroutes: route 0: loading handler modules: position 4: loading module 'mercure': provision http.handlers.mercure: "bolt:///data/mercure.db?subscriptions=1": invalid transport: timeout

If we re-run the same docker stack update command the existing service appears to stop, and the API goes offline for a brief period while the new service starts up, and then everything works again.

Is this caused by some form of locking on the Mercure data store? Is there a way around this?

I've briefly looked at the High Availability docs today, and how you can build a custom transport, but I'm not very familiar with Go, so would not know where to start with this. Any pointers on this, if it would help resolve this issue would be very much appreciated.

Thanks for all your great work on this project.

@toby-griffiths
Copy link
Author

Is anyone able to give me any pointed on this one as we're now approaching a produciton launch and I'd prefer if we didn't gave to do all our deploys out of hours when we can have a brief outage for the update?

Any pointers/thoughts/ideas are very welcome. Thank you.

@dunglas
Copy link
Owner

dunglas commented May 16, 2024

I guess that Docker starts a new container before stopping the existing one. This is an issue when using the Bolt transport because BoltDB relies on a lock. The first container must release the lock for the second one to take it.

An option is to upgrade to the (paid) on-premise version, which doesn't have this issue because, unlike Bolt, Redis supports concurrent connections.

Another option would be to patch check if Docker sends some signals to the existing container before starting the new one, catch this signal in the Bolt transport, and close the connection to the Bolt DB immediately (that will release the lock).

@dunglas
Copy link
Owner

dunglas commented May 16, 2024

This issue seems to confirm this theory: influxdata/influxdb#24320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants