long running blpop crashes with Error while reading line from the server #74

muayyad-alsadi · 2012-05-27T14:11:06Z

hi,

we used to have an issue with legacy version of Predis, we have upgraded to the latest Predis stable 7.2
and we still face the same issue but it's less frequent

the problem goes like this:
we have a loop on blpop which crashes with the following trace

exception 'Predis\Network\ConnectionException' with message 'Error while reading line from the server' in /opt/proj/jeeran-ii/trunk/application/protected/extensions/redis/Predis/Network/ConnectionBase.php:159
Stack trace:
#0 /opt/proj/jeeran-ii/trunk/application/protected/extensions/redis/Predis/Network/StreamConnection.php(194): Predis\Network\ConnectionBase->onConnectionError('Error while rea...')
#1 /opt/proj/jeeran-ii/trunk/application/protected/extensions/redis/Predis/Network/ConnectionBase.php(142): Predis\Network\StreamConnection->read()
#2 /opt/proj/jeeran-ii/trunk/application/protected/extensions/redis/Predis/Network/ConnectionBase.php(134): Predis\Network\ConnectionBase->readResponse(Object(Predis\Commands\ListPopFirstBlocking))
#3 /opt/proj/jeeran-ii/trunk/application/protected/extensions/redis/Predis/Client.php(219): Predis\Network\ConnectionBase->executeCommand(Object(Predis\Commands\ListPopFirstBlocking))
#4 [internal function]: Predis\Client->__call('blpop', Array)

The text was updated successfully, but these errors were encountered:

nrk · 2012-05-27T14:41:11Z

This is most likely not a bug but just a misconfiguration, please follow these steps to make sure that both Redis and Predis are properly configured for long-idling connections and report back on this ticket if you still have problems.

muayyad-alsadi · 2012-05-27T18:00:36Z

configure Redis to not close idle clients after 300 seconds (you can do this by editing your redis.conf and setting the timeout entry to 0) and add read_write_timeout => 0 to the server parameters array / string passed to the constructor of Predis\Client.

I can do this but I believe we should not do that,
I believe it's very sane to have redis close idle connections
as a sort of garbage collection, on the other hand
I think the client should be smart enough to retry using a new connection

since this is a blocking call we can't do periodic pings to avoid being marked as idle
I suggest having the client do more things when we face a similar scenario,
let me take php-mysql as an example

http://php.net/manual/en/function.mysql-ping.php

Ping a server connection or reconnect if there is no connection

in php-mysql an old mysql connection could be used, that connection could be timed-out or closed for many reasons
doing a ping on it will reconnect

so what Predis should do when it get connection reset or closed
it should set some internal variable let's say $_reconnecting=1
then create a new connection replacing the old one, then ping it
then proceed with blpop

if we face this issue again while $_reconnecting===1 then we throw the error

nrk · 2012-05-27T18:23:23Z

As a matter of fact, starting with Redis 2.6 the timeout configuration in redis.conf will be set to 0 by default.

Anyway, if you still prefer to have a timeout set for idle connections then you can just do a BLPOP using a timeout on the monitored key lesser than the timeout specified in redis.conf (e.g. BLPOP list 200), thus making the command return NULL after 200 seconds if no elements are pushed onto the list. If your BLPOP is inside a loop and it returns NULL you can restart the loop right away to make your script wait for a valid value popped from the list. This approach basically makes explicit pings and auto-reconnections useless and you'll have much more control on the behavior of your script without relying on some magic happening behind the scenes.

PS: Predis won't support auto-reconnections since they are error prone and lead to unexpected behaviors in complex states such as MULTI / EXEC, pubsub or pipelines.

muayyad-alsadi · 2012-05-28T08:28:02Z

As a matter of fact, starting with Redis 2.6 the timeout configuration in redis.conf will be set to 0 by default.

I fear having stall connections on redis server living for ever which might lead to prevent it from accepting new connections because it reached maximum limit (if not in conf could be in kernel limit on max open connections per process)

if you still prefer to have a timeout set for idle connections then you can just do a BLPOP using a timeout on the monitored key lesser than the timeout specified in redis.conf (e.g. BLPOP list 200),

we have another case when we do incremental indexing from our database into solr
we read the last point from redis
and after we do the time-consuming indexing we store the new point in redis
which crash Predis because the long time between read and write

PS: Predis won't support auto-reconnections since they are error prone and lead to unexpected behaviors in complex states such as MULTI / EXEC, pubsub or pipelines.

note the way I suggest it, we don't get into infinite loop, because of the flag $_reconnecting=1,
also the new connection is not considered done unless the new one returns a valid ping

if it's illogical for some commands to reconnect by design then we might call some commands reconnectable
I noticed that you have marked some commands as hashable ...etc.

nrk · 2012-05-28T08:46:05Z

I fear having stall connections on redis server living for ever which might lead to prevent it from accepting new connections because it reached maximum limit (if not in conf could be in kernel limit on max open connections per process)

They can't live forever unless you are doing something wrong or unexpected in your application so you might prefer to check where the problem lies in such cases. Or you might indeed need a whole lot of potentially-idling connections for pub/sub or BRPOP/BLOP scenarios, in which case you might need to rely on master/slave replication and use different slaves for subscribers.

we have another case when we do incremental indexing from our database into solr
we read the last point from redis
and after we do the time-consuming indexing we store the new point in redis
which crash Predis because the long time between read and write

You can do a $client->disconnect(), perform your time-consuming indexing and then do a $client->connect() (or let Predis handle connecting to the server upon first command). Yet again a much more predictable behavior, and it's also in line with your fears of having too many connections open to Redis when they are not really needed.

if it's illogical for some commands to reconnect by design then we might call some commands reconnectable
I noticed that you have marked some commands as hashable ...etc.

This would not work with command pipelining and lead to unpredictable behaviors with MULTI / EXEC contexts (once you get a disconnection, the whole transaction is obviously discarded by the server). Hashing a command, on the other hand, is something that does not depend on the state of a client / context / underlying connection.

muayyad-alsadi · 2012-05-28T10:12:59Z

first I would live you quick response and following the issue

second, I agree that multi/exec should not reconnect

but I really feel that other commands should reconnect (eg. get/set or blpop) in a sane way

we are using Yii's CRedisCache and it uses a getter that caches the connection
and there is no way to make it produce a new one

    public function getRedis()
    {
        if($this->_cache!==null)
            return $this->_cache;
        else{
            // to use the old single file approach comment out the next two lines and uncomment the 3rd
            require_once Yii::getPathOfAlias($this->predisPath).'/Autoloader.php';
            Predis\Autoloader::register();
            //require_once Yii::getPathOfAlias($this->predisPath).'.php'; // old single file approach,no autoloading
            Yii::log('Opening Redis connection',CLogger::LEVEL_TRACE);
            return $this->_cache=new Predis\Client($this->servers);
        }
    }

you might say this is their problem, and you might say it's the problem of our search team to add disconnect/connect/ping ..etc.

but I like to think about separation of concern principle, our search team should never go deep and know how the redis component work, it should just get and set keys

I guess this is part of making Predis bullet-proof.

nrk · 2012-05-28T18:24:33Z

Sorry, but I won't add auto-reconnect features to Predis :-) This is probably one of the few things that have been set in stone since long ago and I still think that your proposal would lead to an hackish implementation with unpredictable behaviors. Specific needs such as yours are up to end-developers to implement, that's the very reason why Predis has been designed to be extremely modular.

In your scenario you can extend the underlying connection class Predis\Network\StreamConnection to make it so that executeCommand() wraps the parent method inside a try ... catch block, in this way you can reconnect and send again a command when the connection to a server is lost. You can also use a whitelist approach to decide which commands should behave like this. You can register your newly created connection class to a client instance as described in the README.

Could this approach work for you?

PS: I'd close this issue and #75, we can move to private email if you need some tips.

muayyad-alsadi · 2012-05-29T07:27:04Z

Sorry, but I won't add auto-reconnect features to Predis :-) This is probably one of the few things that have been set in stone since long ago and I still think that your proposal would lead to an hackish implementation with unpredictable behaviors.

it seems that the author of redis agrees with you

https://twitter.com/antirez/status/207117083790684161

I can't see how this is a risk actually. If the application is leaking connections the fix is not timeouts IMHO.

https://twitter.com/antirez/status/207122021665091584

the user is free to change that default. I think that non-zero default creates more problems than it solves on reconnection.

but before I use 0 timeouts I have a concern from my previous experience with php mysql module

once upon time, our mysql server black listed one of our web server (with max user connections exceeded) because there was a php fatal error prevent php script to reach the line where the script closes the connection, and with many people visiting the web server many connections opened but none closes his connection properly

$n=null;
$c=foo_connect();
$n->do_something() ; // fatal error
foo_close($c);

of course the OS will close the connection when the process end but in many cases the process where php lives is a long-living daemon (eg. could be a preforked apache process, php fast cgi daemon, php-fpm daemon, ..etc) in other words OS won't close those connections because the process did not terminate (only the script terminated)

according to Salvatore Sanfilippo (author of redis)
https://twitter.com/antirez/status/207123059700801537
https://twitter.com/antirez/status/207123035243814912

basically it's up to the client library.
there are two types of PHP sockets which one does Predis uses when a php fatal error happens does the redis connection become a zombie/stall forever or the OS take care of cleanup

which type of those two do does Predis uses

nrk · 2012-05-29T09:41:16Z

but before I use 0 timeouts I have a concern from my previous experience with php mysql module

When using mysql_pconnect() to create persistent connections to MySQL, you must make sure that mysql.max_persistent in php.ini is set to a sane value (lower than the one set for MySQL). If you were just using non-persistent MySQL connections, explicit calls to mysql_close() shouldn't be necessary since resources are freed at the end of the script's execution which means that any fatal error should end up closing the connection anyway (well, aside from some bugs in the mysql module or PHP itself).

That said, Predis can support persistent connections when persistent=1 is specified in the connection parameters but it doesn't use them by default. Non-persistent connections are properly closed since they follow the same rules as other non-persistent resources (they are freed at the end of script's execution), but the __destruct() method of connection classes forcefully closes the underlying socket stream anyway, which means that the socket is closed when a connection object is garbage collected because it goes out of scope with no more active references.

muayyad-alsadi · 2012-05-30T07:56:09Z

thank you for such deep follow up,
now I feel safe setting timeout to 0

muayyad-alsadi · 2012-06-02T18:02:40Z

I would like to thank you for helping me regarding the mysql error too by tuning php.ini

http://dev.mysql.com/doc/refman/5.0/en/blocked-host.html

Host 'host_name' is blocked because of many connection errors.
Unblock with 'mysqladmin flush-hosts'

muayyad-alsadi mentioned this issue May 28, 2012

ping connection in a cluster #75

Closed

muayyad-alsadi closed this as completed May 30, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long running blpop crashes with Error while reading line from the server #74

long running blpop crashes with Error while reading line from the server #74

muayyad-alsadi commented May 27, 2012

nrk commented May 27, 2012

muayyad-alsadi commented May 27, 2012

nrk commented May 27, 2012

muayyad-alsadi commented May 28, 2012

nrk commented May 28, 2012

muayyad-alsadi commented May 28, 2012

nrk commented May 28, 2012

muayyad-alsadi commented May 29, 2012

nrk commented May 29, 2012

muayyad-alsadi commented May 30, 2012

muayyad-alsadi commented Jun 2, 2012

long running blpop crashes with Error while reading line from the server #74

long running blpop crashes with Error while reading line from the server #74

Comments

muayyad-alsadi commented May 27, 2012

nrk commented May 27, 2012

muayyad-alsadi commented May 27, 2012

nrk commented May 27, 2012

muayyad-alsadi commented May 28, 2012

nrk commented May 28, 2012

muayyad-alsadi commented May 28, 2012

nrk commented May 28, 2012

muayyad-alsadi commented May 29, 2012

nrk commented May 29, 2012

muayyad-alsadi commented May 30, 2012

muayyad-alsadi commented Jun 2, 2012