Events source of truth: db or receipts #11830

Stebalien · 2024-04-04T15:40:02Z

I'd like resolve a quick design question with respect to event indexing. Should we care about ordering in the database? Or should we be using the actual events as the source of truth?

I'm asking because the original idea was to index keys and values flagged for indexing, not all keys and values. We ended up using the database as the source of truth, but this also means that we ended up inserting a bunch of fields into the database that technically aren't supposed to be indexed and technically maybe should not even be queryable.

The alternative would be to find events via the index, but then actually look up the real events from the receipts tree and return those to the user. Unfortunately, that's almost certainly going to have a performance impact and will increase complexity.

rvagg · 2024-05-02T07:57:12Z

I just discovered a case where this is important: I made a boo boo on my mainnet node and decided to start again from a snapshot. The snapshot will reset the splitstore but not the database, which should be fine, except for the fact that when my node got messed up it may not have been on the canonical chain. So even though a snapshot resync may give me new events (tbh I'm not convinced that it's doing this properly, that's another story though) and not duplicate existing events because of the duplicate checking on inserts, it won't give me any reverts that I should have.

I've been thinking about this in the context of this: #11770 (comment) - if we make the APIs "give me events since this tipset", and walk from that tipset to the current one, we should be able to see where we go backward and call those reverts regardless of what the database says.

It doesn't help with the case of "give me all events from height X", however, because we're currently just going to query the database and give them whatever shows up, including possibly some events from the same height but different tipset because they haven't been marked as reverts. In that case, we may want to do the walk of the tipsets ourselves and only collect events per-tipset and collate them for the user rather than collecting them as a whole batch with a single query. Then we just have to ask whether it'd be nearly as efficient to just read from the AMT instead of the database (probably not, but maybe it's close).

Stebalien · 2024-05-02T16:48:47Z

Oh... IMO, that's closer to #11640. I.e., when we restart, we need to process all applies/reverts between the last tipset we processed and the current tipset.

Handling snapshots may be a bit tricky...

Stebalien added kind/discussion Kind: Discussion area/api area/events labels Apr 4, 2024

Stebalien mentioned this issue Apr 4, 2024

Actor event entries do not retain order when passing through event index #11823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events source of truth: db or receipts #11830

Events source of truth: db or receipts #11830

Stebalien commented Apr 4, 2024

rvagg commented May 2, 2024

Stebalien commented May 2, 2024

Events source of truth: db or receipts #11830

Events source of truth: db or receipts #11830

Comments

Stebalien commented Apr 4, 2024

rvagg commented May 2, 2024

Stebalien commented May 2, 2024