Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/redis: reimplement using scripts #472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrd0ll4r
Copy link
Member

This possibly fixes #471, and also possibly increases performance, but who knows...

storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Outdated Show resolved Hide resolved
storage/redis/peer_store.go Show resolved Hide resolved
storage/redis/scripts.go Outdated Show resolved Hide resolved
@mrd0ll4r
Copy link
Member Author

PTAL

storage/redis/peer_store.go Outdated Show resolved Hide resolved

// SCAN through all of our swarm keys.
for {
values, err := redis.Values(conn.Do("SCAN", cursor, "MATCH", "swarm_*", "COUNT", 50))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, as long as there's one instance of Chihaya. Once there are multiple trying to GC the same redis instance/cluster, this strategy is non-ideal. I think calculating a shard size based on the total size of the database, and then choosing one at random then GCing all of those keys might be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another note -- is there a reason we don't do this part in the script as well to avoid round trips?

Copy link
Member Author

@mrd0ll4r mrd0ll4r Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCAN is nondeterministic, and Redis doesn't allow scripts to have write commands (such as deleting stuff) after anything nondeterministic, in order not to break replication. (See here)

I mostly get what you're saying in your first comment, i.e. instead of processing everything with a swarm_ prefix, better pick one of swarm_<single byte> at random and GC that. (The first byte is the largest shard we can get.)
I don't like it, because:

  1. As the largest shards are 1/256th of the whole keyspace, we would need to GC 256 times as often, on average, to get the same rate of garbage collection as before. (or do multiple shards in one GC)
  2. This rate changes with the number of chihaya instances running.
  3. We could try to have a counter, or volatile keys, or some other mechanism of counting the number of active chihaya instances, but this is non-trivial. (This is an interesting problem and I kind of want to work on it, but not for this PR.)
  4. The added randomness is good for distributed systems like this, but I would argue we don't need it: The SCAN command is already nondeterministic, i.e. it depends on some internal state of the redis instance. There are a bunch of properties around what keys are returned by SCAN, which make me think that this is generally a very elegant solution to our problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added randomness is good for distributed systems like this, but I would argue we don't need it: The SCAN command is already nondeterministic, i.e. it depends on some internal state of the redis instance. There are a bunch of properties around what keys are returned by SCAN, which make me think that this is generally a very elegant solution to our problem.

SCAN is nondeterministic because it's implemented as a cursor in a database that isn't MVCC. Thus parallel operations will change values out from underneath it while you're iterating. That's all it means; it still iterates in the same order across clients. I've both read the code and tested this locally to confirm that this is the case. The guarantees listed in that doc are the exact same in our iteration of the keyspace for the memory implementation.

This rate changes with the number of chihaya instances running.

This is already the case -- if I run two instances of chihaya, the garbage collection on each instance is set to run at the same interval, but different offsets based on when the instance was started. If they were both started around the same time, they're basically going to iterate over the whole database one right after the other which would make the second instance functionally useless while also being expensive on the database.

In order to maximize the usefulness of each instance's garbage collection, the goal should be to have a cluster-wide solution to garbage collection. A solution could roughly look like one of two options: elect only one garbage collection process to run across the deployment or devise a strategy for collaboration across instances. If we're clever that collaboration need not require coordination, which is complicated to implement.

In an unrelated project, we have a means of coordination-free work for backfilling database values. You can keep spinning up more workers and the progress will get faster and faster. For this implementation, the requirements are idempotent work and a uniform keyspace indexed by integers that you can use as bounds.

While this is not directly applicable, I believe that thinking about a strategy along these lines could provide a more ideal solution. Thus, I'd rather discuss and explore this train of thought before compromising and doing something like staggering when GC starts or using a global lock to elect the next person to GC.

storage/redis/peer_store.go Outdated Show resolved Hide resolved
@jzelinskie
Copy link
Member

Hey, I don't think we should leave this blocked on the GC design.
We can always fix that in a follow up to be more efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

infohashes_count goes into negative values
2 participants