Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis Streams support #173

Closed
wants to merge 9 commits into from
Closed

Conversation

benlangfeld
Copy link
Collaborator

@benlangfeld benlangfeld commented Nov 6, 2018

This change offers an opt-in alternative implementation of MessageBus' Redis backend which uses Redis Streams rather than sorted sets + PubSub.

Redis Streams have constant-time insertion performance, while sorted sets are O(log(N)) in the size of the set. Streams can also trim themselves, rather than requiring us to trim manually using ZREMRANGEBYSCORE, which I hope (though have not confirmed) is similarly more performant. Subscribing to messages pushed onto Streams is also more durable than PubSub's SUBSCRIBE; global subscribers will not miss messages published during momentary connection failure to Redis, which should make delays in message delivery to subscribed MessageBus clients less of a problem.

TODO:

  • Get tests passing in CI by using an appropriate version of Redis.
  • Eliminate Redis PubSub - long-poll the streams themselves.
    • Resolve spec flakyness when timeouts expire
    • Investigate relying on Streams' automatic message IDs to avoid duplication of state with counters and simplify recovery from data loss. (Redis streams native ids powerhome/message_bus#3)
    • Increase blocking period for less regular subscription polling. (Blocked on usage of native IDs)
  • Improve stream trimming performance by being imprecise. In-progress, test failures
  • Improved documentation of mode of operation, not just for Redis streams but for other backends and MessageBus in general. (Fuller interface documentation #184)
  • Evaluate relative performance of each option; performance test suite

Closes #171.

@SamSaffron
Copy link
Member

this looks great to me, I would love to see and extra [ ] for testing performance, I guess we needs some sort of performance suite.

@benlangfeld
Copy link
Collaborator Author

Note that there is some abstraction of streams coming in the redis-rb library. We don't need it for this use case, but could refactor based on it in the future.

@benlangfeld
Copy link
Collaborator Author

Note that there is a bug in released Redis where listening for updates on streams which have been emptied can cause unpredictable results. At this point, I'm not sure how much it would impact message_bus, but it's worth keeping an eye on: redis/redis@c1e9186

@benlangfeld
Copy link
Collaborator Author

benlangfeld commented Nov 30, 2018

The reason that we can't increase the duration for which we block listening for messages on streams is that we rely on regularly polling a separate key which stores our subscription state in order to detect the condition where Redis has been emptied, triggering us to wind back to read the stream from the start again.

There are two possible avenues to resolve this:

  • Redis could inform us that a stream has been deleted such that we know to read from the start of it. I have filed Blocking XREAD on streams can't detect the stream being deleted redis/redis#5625 with this feature request. Even if implemented as requested, this has flaws:
    • A stream could be deleted and re-created in the interval between XREAD calls by a client holding a message pointer, who would then not be able to detect the requirement to roll back.
    • It's plausible that the integer ID keys be deleted while a Stream persists, which would cause publications to fail (streams are append-only).
    • It's likely that deletion of a stream would not trigger deletion of the subscription key, so we would not correctly wind back.
    • Others?
  • We could include a vector clock in the stream message IDs such that our pointers will be valid even if streams get re-created, and we thus never have to explicitly wind back. This solution is more robust, because it doesn't come unstuck in the case where the stream dies and is repopulated in-between blocking XREAD calls, and it also removes the assumption that the stream and the subscription key are maintained or removed atomically. It does, however, require message_bus to abandon the idea of incrementing integer IDs in favour of more complex opaque IDs.

This is the biggest blocker to proceeding with this PR right now.

Implements backlogs using streams rather than sorted sets. Streams do their own length limitation. See https://redis.io/topics/streams-intro.
@benlangfeld benlangfeld mentioned this pull request Dec 5, 2018
@SamSaffron
Copy link
Member

@benlangfeld do you feel we should keep this open? I am totally open to adding more backend implementations its just that we have been sitting on this for way too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Re-implement Redis backend using Streams (Redis 5.0)
2 participants