Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this driver add latency? #32

Open
erfansahaf opened this issue Sep 18, 2020 · 6 comments
Open

Does this driver add latency? #32

erfansahaf opened this issue Sep 18, 2020 · 6 comments
Labels

Comments

@erfansahaf
Copy link

erfansahaf commented Sep 18, 2020

I tried to use this driver in a production setup with 3 Sentinel instances on a 1 Master + 2 Replica cluster but I ended up with 20ms extra latency in the response time.

In a single Redis node situation, we have a response time around 47ms. But after using this driver, the response time increased to about 70ms.

I want to know what caused this latency. Is it the driver or it's a natural behavior in a Redis Sentinel cluster?

Thanks.

@cyrossignol
Copy link
Member

Yes, this driver adds latency. A Sentinel-aware application trades a bit of responsiveness to support high-availability. When the client executes the first Redis command in the request lifecycle, it queries a Sentinel server for the address of a healthy Redis server. This means that the client needs to make two connections: one to a Sentinel, and another to execute the specific commands on a Redis server.

If needed, we can use several strategies to mitigate the extra overhead. Placing Sentinel servers "physically" closer to application instances (like on the same nodes) can reduce the communication delay from the network. See the Sentinel docs for a discussion about different topologies.

We can also write a custom Redis client that tries to execute commands against known Redis servers directly and defer the Sentinel query for just cases where the client detects that the command failed because of a connection problem. This approach gives up some flexibility in return for performance in environments where Redis servers do not fail often. Such a client in a Laravel application might wrap Laravel's standard Redis service along with the service provided by this package, and it needs to know the connection configuration for both the Redis and Sentinel servers from the beginning.

Typical PHP environments destroy in-memory application state between requests. For this reason, a Laravel application that depends on this package cannot remember the results of a Sentinel query for the next request in standard FastCGI environments or Apache HTTPD's mod_php. You might consider using an async/event-driven framework like PHP-PM or Swoole to host a Laravel application if minimum latency is critical to your deployment. These run long-lived processes to avoid rebooting the application for each request.

@erfansahaf
Copy link
Author

erfansahaf commented Sep 19, 2020

Thanks for the clarification.

I have a proposal for this problem. How about having two separate environment variables for the Redis informations? One for the Sentinel (like SENTINEL_HOST=1.1.1.1) and the other for the active and healthy Redis instance (like REDIS_HOST=2.2.2.2). In the first request, the REDIS_HOST variable is emtpy and the application doesn't know about the address of healthy Redis node so it queries the Sentinel to get the address of the node. Once it has the needed informations, it will write back these information into REDIS_HOST and REDIS_PORT variables for the next incoming requests. After the first request, we can guarantee that the REDIS_HOST holds the healthy node address so the client can reach that address and run the command directly on the node.

Also, In case of failure, we repeat the process. The application queries the Sentinel once again to load the new healthy node and updates these environment variables.

In such scenario, we don't have to query the Sentinel in every request and still, we have the healthy node address stored in a environment variable just like a single Redis instance. Also, there is no dependency on memory to keep the active node's information for us.

What do you think? Will this solution on application level help us to decrease the latency?

@cyrossignol
Copy link
Member

cyrossignol commented Sep 19, 2020

You're describing caching of the Sentinel query result to avoid the initial query on subsequent requests. This can work, but keep these points in mind:

In most cases, we cannot write to an environment variable from a PHP process and expect the value to persist between requests. Typical PHP runtimes will reset environment variables for each request, so the value of REDIS_HOST set by a Laravel application will not remain for the next request.

Some might consider dynamically rewriting the .env file or the configuration files in the config/ directory. I don't recommend this approach. The .env file is intended for use in development environments only. In many deployments, these files are read-only, or the application uses an optimized configuration loaded from an aggregate file generated by artisan config:cache.

That said, we can use a local, file-based cache to store the set of Redis servers loaded from Sentinel. In many cases, though, reading from a local filesystem—even one backed by SSDs—is slower than a network request to a nearby server. Sentinels respond very quickly to requests. I've observed that Sentinel queries can finish in less than 1ms in production. Network congestion, routing problems, and DNS resolution can slow these down.

For those that really need it, a tmpfs-backed or APCu cache might help to reduce latency by storing Redis server addresses in memory. These approaches aren't really in the scope of this package since these depend on aspects of the system that a general-purpose library cannot really accommodate well for most users.

More advanced deployments may use orchestration tools or datacenter runtimes to subscribe to Sentinel events to update the PHP application configuration when the Redis server availability changes. For example, a background process can SUBSCRIBE to switch-master events and push a change to a service registry that contains the addresses of the healthy Redis servers. That action can then trigger PHP application hosts to gracefully reload with the new configuration.

As a final suggestion, look into running artisan config:cache as part of your deployment process if it doesn't do this already. This package does a lot of dynamic configuration processing for local development that we don't really need in production. Caching the configuration removes this overhead.

@erfansahaf
Copy link
Author

Do you have any experience in working with Sentinel out of a Laravel application? I want to know how much of the overhead is caused by this package and how much is caused by the predis itself.

Honestly, I'm in a decision point and I want to know does it worth to switch from predis to phpredis just because of this latency on quering Sentinel or not. Will we see a big difference in response time (decreasing the latency from 20ms to 5ms) if we don't use this package?

@cyrossignol
Copy link
Member

cyrossignol commented Dec 2, 2020

@erfansahaf Sorry for the delay. My GitHub email notifications were not arriving for this repository.

I use Sentinel in some non-PHP projects as well. We haven't observed significant differences in latency between platforms. Some early synthetic benchmarks indicated that this library adds a few milliseconds of latency, but not as great as you describe. However, I haven't systematically measured the performance of this package recently—it's about time to collect some new data.

If you did switch from Predis to PHPRedis since then, I'm interested to learn about your findings.

@erfansahaf
Copy link
Author

erfansahaf commented Dec 13, 2020

@cyrossignol Actually we dropped the usage of Sentinel in our stack for now. It wasn't a good tradeoff, the response time matters the most to us.

Also, there was not an existing implementation for PHPRedis (with Sentinel support) in Laravel to switch to. So we preferred to leave codes as they are and instead, have some slave instances with a manual master-recovery process when the original master fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants