Cache Reconnection #1948
mattwiller
started this conversation in
Ideas
Replies: 2 comments
-
Thanks @mattwiller, this is a great doc. As discussed offline, we may need to consider separating BullMQ out to a separate Redis instance to avoid interfering with ongoing jobs. Overall, this seems like a win for correctness and durability. |
Beta Was this translation helpful? Give feedback.
0 replies
-
After some more offline discussion and research, I think a rough sequencing of work might look like this:
Additionally, a few considerations that have been identified for this:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Goals
Present state
Medplum currently uses a cache-aside caching strategy:
Cache Reconnection Scenario
Proposed solution
If the cache is unavailable, falling back to read and write directly to the database will maintain application availability but also has the potential to increase the load on the database to unsustainable levels. To mitigate this risk, service-wide rate limits for all read and write operations could be enforced. In the event that the cache becomes unavailable, we could also consider dynamically reducing those limits.
When reconnecting to the cache after a period of unavailability, we cannot assume that the cache is in any particular state: it may have crashed completely and now be empty, or it could have been on the other side of a network partition and be full of potentially stale data. The simplest solution in this case would be to issue a
FLUSHALL SYNC
command to Redis after reconnecting, to ensure that we start from an empty cache. This should prevent the application from reading stale data, with the caveat that multiple server instances will reconnect over a period of time, repeatedly clearing the cache and resulting in degraded performance until all instances are reconnected.Beta Was this translation helpful? Give feedback.
All reactions