Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events source of truth: db or receipts #11830

Open
Stebalien opened this issue Apr 4, 2024 · 2 comments
Open

Events source of truth: db or receipts #11830

Stebalien opened this issue Apr 4, 2024 · 2 comments

Comments

@Stebalien
Copy link
Member

I'd like resolve a quick design question with respect to event indexing. Should we care about ordering in the database? Or should we be using the actual events as the source of truth?

I'm asking because the original idea was to index keys and values flagged for indexing, not all keys and values. We ended up using the database as the source of truth, but this also means that we ended up inserting a bunch of fields into the database that technically aren't supposed to be indexed and technically maybe should not even be queryable.

The alternative would be to find events via the index, but then actually look up the real events from the receipts tree and return those to the user. Unfortunately, that's almost certainly going to have a performance impact and will increase complexity.

@rvagg
Copy link
Member

rvagg commented May 2, 2024

I just discovered a case where this is important: I made a boo boo on my mainnet node and decided to start again from a snapshot. The snapshot will reset the splitstore but not the database, which should be fine, except for the fact that when my node got messed up it may not have been on the canonical chain. So even though a snapshot resync may give me new events (tbh I'm not convinced that it's doing this properly, that's another story though) and not duplicate existing events because of the duplicate checking on inserts, it won't give me any reverts that I should have.

I've been thinking about this in the context of this: #11770 (comment) - if we make the APIs "give me events since this tipset", and walk from that tipset to the current one, we should be able to see where we go backward and call those reverts regardless of what the database says.

It doesn't help with the case of "give me all events from height X", however, because we're currently just going to query the database and give them whatever shows up, including possibly some events from the same height but different tipset because they haven't been marked as reverts. In that case, we may want to do the walk of the tipsets ourselves and only collect events per-tipset and collate them for the user rather than collecting them as a whole batch with a single query. Then we just have to ask whether it'd be nearly as efficient to just read from the AMT instead of the database (probably not, but maybe it's close).

@Stebalien
Copy link
Member Author

Oh... IMO, that's closer to #11640. I.e., when we restart, we need to process all applies/reverts between the last tipset we processed and the current tipset.

Handling snapshots may be a bit tricky...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants