Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis connection errors #211

Open
nikosmichas opened this issue Nov 16, 2021 · 10 comments
Open

Redis connection errors #211

nikosmichas opened this issue Nov 16, 2021 · 10 comments

Comments

@nikosmichas
Copy link
Contributor

nikosmichas commented Nov 16, 2021

Hello!
I have increased the number of workers for the webserver and run the services through docker-compose, while sending a lot of API requests.
After some time, I start getting errors like these in the logs of the server:

musicbrainz_1  | 2021-11-16T13:52:22.714329479Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:52:26.429439298Z [error] Caught exception in engine "Could not connect to Redis server at redis:6379: Cannot assign requested address at lib/MusicBrainz/Redis.pm line 24.
musicbrainz_1  | 2021-11-16T13:52:26.429469986Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:52:36.652664437Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Work->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.


musicbrainz_1  | 2021-11-16T13:47:21.205178216Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Recording->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.
musicbrainz_1  | 2021-11-16T13:47:21.205218285Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."
musicbrainz_1  | 2021-11-16T13:47:22.142083445Z [error] Caught exception in MusicBrainz::Server::Controller::WS::2::Recording->load "Could not connect to Redis server at redis:6379: Cannot assign requested address at /root/perl5/lib/perl5/Redis.pm line 275.
musicbrainz_1  | 2021-11-16T13:47:22.142122078Z 	...propagated at /root/perl5/lib/perl5/Redis.pm line 613, <PKGFILE> line 1."

Any idea/suggestion on how to handle this?

@yvanzo
Copy link
Contributor

yvanzo commented Nov 18, 2021

Hi!

It might be the Redis instance is over-solicited. You may need to get your hands dirty at tuning the configuration of your Redis instance. It can probably be achieved by passing options on the command-line through a local Docker Compose override file.

Here are the options we pass to our Redis instance for cache at musicbrainz.org:

--maxmemory 1GB --maxmemory-policy allkeys-lru --save ""

To pass these options to the command-line, please read the quick how-to I wrote about Docker Compose Overrides and adapt Modify memory settings to your specific needs which look like services > redis > command > redis --maxmemory….

@mwiencek: Since you are more knowledgeable than me about Redis use in MusicBrainz, can you please double-check both the reported issue (for a potential bug to be fixed in musicbrainz-server) and my answer (for potential misconceptions)?

@nikosmichas
Copy link
Contributor Author

Thanks @yvanzo for your reply. I will try this

@nikosmichas
Copy link
Contributor Author

It seems that the issue was not specific for Redis. Even after disabling redis entirely, I kept getting a similar error for Postgres.
The problem is that because of the large number of requests that I was sending, the OS of the MusicBrainz container could not create a new socket between itself and the other services.

I noticed with netstat that a huge amount of TIME_WAIT connections was there not allowing new connections to be created.
I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors.

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

More information about the issue https://www.percona.com/blog/2014/12/08/what-happens-when-your-application-cannot-open-yet-another-connection-to-mysql/

I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

@yvanzo
Copy link
Contributor

yvanzo commented Dec 1, 2021

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

In production, we use pgbouncer. Would it be worth including it in musicbrainz-docker too?

@yvanzo
Copy link
Contributor

yvanzo commented Dec 1, 2021

I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors.
I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

Thanks, if that would be complementary to Postgres connection pooling, yes.

@nikosmichas
Copy link
Contributor Author

Ideally, this could be resolved at the application level, by reusing the connections it is creating (eg use connection pooling for Postgres)

In production, we use pgbouncer. Would it be worth including it in musicbrainz-docker too?

Maybe it would, even as an optional part.

I resolved this by changing tcp_max_tw_buckets in the MusicBrainz docker image and now the services are able to run with approximately 100 web workers in parallel without "Cound not connect" errors.
I could open a Pull Request with the change in the docker-compose.yml if you think that this may be useful in other cases.

Thanks, if that would be complementary to Postgres connection pooling, yes.

Yes, it can be complementary to the pooling and it will also help avoid issues with Redis. I will open a PR shortly.

@nikosmichas
Copy link
Contributor Author

By the way @yvanzo do you know if you use a non-default value for --max-keepalive-reqs and --keepalive-timeout in Starlet?

Setting them also helped me reduce the amount of open sockets a bit..

@JoshDi
Copy link

JoshDi commented Aug 9, 2022

@nikosmichas have you figured out how to modify the local/compose/memory-settings.yml file to accomplish this? I believe this is working for me

version: '3.1'
# Description: Customize memory settings
services:
   redis:
      command: redis-server --maxmemory 1GB --maxmemory-policy allkeys-lru --save ""

@JoshDi
Copy link

JoshDi commented Aug 17, 2022

using the values above cause my slave server to sometimes timeout for MQ queue or checking the index count vs the DB. I have removed these redis-server modifications and I have not had an issue anymore.

@JoshDi
Copy link

JoshDi commented Aug 18, 2022

Correction, the error I am getting is below and started to occur after upgrading to ubuntu x64 22.04.1 LTS.

OCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

From searching on the web, it looks to be an SELinux issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants