Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router API cannot connect to Mongo 2.6 #533

Open
huwd opened this issue Dec 3, 2021 · 1 comment
Open

Router API cannot connect to Mongo 2.6 #533

huwd opened this issue Dec 3, 2021 · 1 comment
Assignees

Comments

@huwd
Copy link
Member

huwd commented Dec 3, 2021

We've encountered a chaining problem when looking into how publishing-api tries to put things onto the rabbitMQ, which we've traced:
Publishing API -> Content Store -> Router API -> Router

The problem seems to be that router-api cannot find a server:

To replicate:

➜  router-api git:(main) govuk-docker-run bundle exec rails c
docker-compose -f [...] run router-api-lite bundle exec rails c
Creating govuk-docker_router-api-lite_run ... done
Loading development environment (Rails 6.0.3.7)
irb(main):001:0> Route.count
Traceback (most recent call last):
        1: from (irb):1
Mongo::Error::NoServerAvailable (No primary server is available in cluster: #<Cluster topology=Unknown[mongo-2.6:27017] servers=[#<Server address=mongo-2.6:27017 GHOST>]> with timeout=30, LT=0.015)

The mongo container does run, and you can watch logs though it is in a big loop of opening and closing connections punctuated by the following failry suspect message:

2021-12-03T16:06:57.809+0000 [rsStart] warning: getaddrinfo("48703775aaf0") failed: Name or service not known
2021-12-03T16:06:57.846+0000 [rsStart] getaddrinfo("48703775aaf0") failed: Name or service not known
2021-12-03T16:06:57.846+0000 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.

@kevindew spotted that if we comment out this line things start working again.

That seems to have been introduced during work to to resolve differences in how rs.status responds between mongo v.2.6 (which router runs in prod) and more modern versions.

#499

This may have been an attempt to resolve this issue: alphagov/router#210

Question to answer: what was L46 trying to resolve? Does it still serve that purpose? Can we replace it with something doesn't block local dev, or remove it altogether?

@huwd huwd self-assigned this Dec 3, 2021
@karlbaker02
Copy link
Contributor

karlbaker02 commented Dec 3, 2021

L46 is necessary as we have been running MongoDB as a replica set since around April 2021, in order to enable the app to be replatformed. Previously, Router API knew about all running Router instances and would, upon a request to update a route, update said route and then call the /reload endpoint on each and every Router instance in order to ensure each instance's routes were up-to-date.

Replatforming changed this behaviour so that instead of Router API needing to know about individual Router instances (hardcoded instances, which was not translatable into the Kubernetes world into which we're now moving), Router instances would instead poll MongoDB for any new changes every few seconds; the way that we enabled this was through the use of a replica set and the db.stats() method to determine whether an instance has an up-to-date copy of the current routes from MongoDB by comparing the current optime to it's cached optime and reloading if changes have occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants