You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like resolve a quick design question with respect to event indexing. Should we care about ordering in the database? Or should we be using the actual events as the source of truth?
I'm asking because the original idea was to index keys and values flagged for indexing, not all keys and values. We ended up using the database as the source of truth, but this also means that we ended up inserting a bunch of fields into the database that technically aren't supposed to be indexed and technically maybe should not even be queryable.
The alternative would be to find events via the index, but then actually look up the real events from the receipts tree and return those to the user. Unfortunately, that's almost certainly going to have a performance impact and will increase complexity.
The text was updated successfully, but these errors were encountered:
I just discovered a case where this is important: I made a boo boo on my mainnet node and decided to start again from a snapshot. The snapshot will reset the splitstore but not the database, which should be fine, except for the fact that when my node got messed up it may not have been on the canonical chain. So even though a snapshot resync may give me new events (tbh I'm not convinced that it's doing this properly, that's another story though) and not duplicate existing events because of the duplicate checking on inserts, it won't give me any reverts that I should have.
I've been thinking about this in the context of this: #11770 (comment) - if we make the APIs "give me events since this tipset", and walk from that tipset to the current one, we should be able to see where we go backward and call those reverts regardless of what the database says.
It doesn't help with the case of "give me all events from height X", however, because we're currently just going to query the database and give them whatever shows up, including possibly some events from the same height but different tipset because they haven't been marked as reverts. In that case, we may want to do the walk of the tipsets ourselves and only collect events per-tipset and collate them for the user rather than collecting them as a whole batch with a single query. Then we just have to ask whether it'd be nearly as efficient to just read from the AMT instead of the database (probably not, but maybe it's close).
Oh... IMO, that's closer to #11640. I.e., when we restart, we need to process all applies/reverts between the last tipset we processed and the current tipset.
I'd like resolve a quick design question with respect to event indexing. Should we care about ordering in the database? Or should we be using the actual events as the source of truth?
I'm asking because the original idea was to index keys and values flagged for indexing, not all keys and values. We ended up using the database as the source of truth, but this also means that we ended up inserting a bunch of fields into the database that technically aren't supposed to be indexed and technically maybe should not even be queryable.
The alternative would be to find events via the index, but then actually look up the real events from the receipts tree and return those to the user. Unfortunately, that's almost certainly going to have a performance impact and will increase complexity.
The text was updated successfully, but these errors were encountered: