Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some improvements to stream compare command #352

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whyrusleeping
Copy link
Collaborator

This isnt done yet but i could use some extra eyes on it to think through how we want it to work.
One notable problem thats tricky to deal with is on startup, we inevitably have some mismatch in events between the two streams, and its not entirely clear the best way to progress from there.

@ericvolp12
Copy link
Collaborator

I think it would be neat to pick one stream as an authoritative subset (i.e. a PDS stream) and one as an aggregate superset (i.e. a BGS stream). We want to make sure all events in the subset are included in the superset stream and that they appear in-order on a per-repo basis (i.e. monotonically increasing Revs). If we're dealing with commits, we can record the DID and Rev of the commit and then a hash of the blocks from the authoritative stream, then when we see that DID and Rev come out the other stream we check that the block hash is the same or something like that maybe?

Is the goal here to ensure that the aggregate stream has the same HEAD for all repos as the authoritative stream? If that's the case then we can track heads from the authoritative stream on a per-repo basis and make sure the aggregate stream only produces heads that match what we got from the authoritative stream. The tricky bit is determining if the aggregate stream is "too far behind" somehow I suppose.

Maybe you keep a couple (repo, head, timestamp) for the authoritative stream, the same thing for the aggregate stream.
When a new event comes in the aggregate stream, you make sure the (repo, head) matches the next one in the list from the authoritative stream (and dump the old values once we've confirmed them from the aggregate stream).

Every 30 seconds or whatever, you can do a sweep to see which heads haven't propagated yet that were emitted some TTL ago and then complain about it with log messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants