Events source of truth: db or receipts #11830

Stebalien · 2024-04-04T15:40:02Z

I'd like resolve a quick design question with respect to event indexing. Should we care about ordering in the database? Or should we be using the actual events as the source of truth?

I'm asking because the original idea was to index keys and values flagged for indexing, not all keys and values. We ended up using the database as the source of truth, but this also means that we ended up inserting a bunch of fields into the database that technically aren't supposed to be indexed and technically maybe should not even be queryable.

The alternative would be to find events via the index, but then actually look up the real events from the receipts tree and return those to the user. Unfortunately, that's almost certainly going to have a performance impact and will increase complexity.

rvagg · 2024-05-02T07:57:12Z

I just discovered a case where this is important: I made a boo boo on my mainnet node and decided to start again from a snapshot. The snapshot will reset the splitstore but not the database, which should be fine, except for the fact that when my node got messed up it may not have been on the canonical chain. So even though a snapshot resync may give me new events (tbh I'm not convinced that it's doing this properly, that's another story though) and not duplicate existing events because of the duplicate checking on inserts, it won't give me any reverts that I should have.

I've been thinking about this in the context of this: #11770 (comment) - if we make the APIs "give me events since this tipset", and walk from that tipset to the current one, we should be able to see where we go backward and call those reverts regardless of what the database says.

It doesn't help with the case of "give me all events from height X", however, because we're currently just going to query the database and give them whatever shows up, including possibly some events from the same height but different tipset because they haven't been marked as reverts. In that case, we may want to do the walk of the tipsets ourselves and only collect events per-tipset and collate them for the user rather than collecting them as a whole batch with a single query. Then we just have to ask whether it'd be nearly as efficient to just read from the AMT instead of the database (probably not, but maybe it's close).

Stebalien · 2024-05-02T16:48:47Z

Oh... IMO, that's closer to #11640. I.e., when we restart, we need to process all applies/reverts between the last tipset we processed and the current tipset.

Handling snapshots may be a bit tricky...

raulk · 2024-06-05T10:34:38Z

Chiming in with a birds' eye view, hoping to provide some perspective from recent battles on the field.

I think it's helpful to break down the access patterns and query operations to move this convo forward.

eth_getTransactionReceipt: takes a tx hash and returns the receipt, including logs emitted during execution.
eth_getLogs: returns only logs, no receipts, in a range of epochs, defaulting to latest if the user omits them in the query.
eth_subscribe: takes a filter definition, and streams events via WSS.
eth_getFilterChanges: takes a filter definition, and returns new matches since the last poll.

Clients expect consistency across methods in terms of data availability. For example, taking a transaction hash returned by eth_getLogs and feeding it to eth_getTransactionReceipt (e.g. to get the return data) must return a valid receipt that embeds the logs seen in the first call. Similarly, the logs returned inside a receipt object must match the logs that would be returned for the same tx or block range through the other log-returning methods.

I agree with @Stebalien we do have some rough edges that stem from an unclear/blurry technical design. Here's what I propose concretely. Bear with me if some of these points are already implemented. I have been somewhat disconnected of Lotus development, and I'm trying to reason from first principles.

We introduce an "execution outputs" blockstore; this is a virtual store that physically can map to the same monolithic store as the chain and state stores. It stores receipts and native events (by storing the AMTs and child blocks). It knows nothing about Eth or concretely about events.
We preserve the current "event index" as a dedicated database acting like a lens over event data, and the "receipts index" which maps message CIDs to receipt CIDs, and records inclusion/execution tipsets.
Lotus ensures that the two are 100% consistent with each other: all execution outputs must have indexed events, and all epochs indexed are guaranteed to have their receipts stored in the execution outputs blockstore.
Lotus tracks metadata about the data availability of each store, i.e. which epoch range it holds. Looks like Record processed epochs / tipsets in events index #11640 covers this.
When the user attempts to run queries for epochs that aren't available, we reject them (based on the above metadata), instead of returning empty events (this is the major painpoint today).
ChainGetEvents operates from the
We can expose a dev_ Eth JSON-RPC operation to query the data availability of stores and their configured retention period, so that users can interrogate a node about the data range it holds.
Ideally: each store records an audit trail within the store, adding an entry per epoch processed with some rollup metrics to aid diagnostics / stats, e.g. the ingestion timestamp and number of entries.
Perhaps: consider placing the "execution outputs" blockstore on a relational database and leveraging foreign keys with relevant CASCADE statements to preserve consistency with the event index.
Snapshots should become modular bundles that also include these databases for node operators to import optionally.
Commands should exist to:
- recompute a chain portion and populate execution outputs and event index (already exists AFAIK)
- populate the event index from execution outputs
- perform consistency checks across the chainstore, statestore, execution outputs store, and event index
May need to reengineer some configuration knobs and may need to adjust garbage collection processes.

The main problems I'm interested in solving are:

RPCs returning an error when data is requested for non-available epochs, instead of returning an empty answer (misleading).
Enable hosted node operators to import snapshots and be ready to go, instead of having to index events to be useful to Eth users. This, combined with the previous point, have caused recent grievance. (Something goes wrong in infra, node operator recycles their deployment by importing a new snapshot, clients begin querying the node, node return empty events due to data unavailability, clients assume no events were emitted for those epochs and move on).
Ensure consistency across all stores/indices on a continuous basis.
Expose an operation for users to interrogate the data availability of a node.

Stebalien · 2024-06-05T16:40:48Z

I'm not seeing the need for all this complexity. What we need to ensure is:

Lotus ensures that the two are 100% consistent with each other: all execution outputs must have indexed events, and all epochs indexed are guaranteed to have their receipts stored in the execution outputs blockstore.

Once we have that, we should be good to go. We can get the consistency by:

Lotus tracks metadata about the data availability of each store, i.e. which epoch range it holds. Looks like Record processed epochs / tipsets in events index #11640 covers this.
Pruning the events DB when pruning the splitstore (as described in this issue).
Automatic backfilling (Option to back-fill events by re-execution of messages #11744). Personally, I'm not a fan of our "let the user decide if/when to backfill" approach. IMO, if you've enabled event indexing, everything should "just work" without a bunch of knobs.

Once we have that, we can fix our current API endpoints such that, instead of making event handling "best effort", we always return events if we should be indexing/storing them and never return events otherwise.

That should cover everything, IMO.

Stebalien · 2024-06-05T16:58:40Z

To be clear, there are other good ideas in here (querying node for the supported state-range, etc.). But this issue is trying to address the specific issue where different commands have different "views" of the state.

rvagg · 2024-06-06T01:46:18Z

As I use the events for builtin actors I've been bumping into the consistency problem in odd ways, so this is something I'd really like to address properly. I've managed to get my node into inconsistent state a couple of times, mainly because of "advanced" tinkering, but it's the kind of thing that I can easily imagine others replicating.

We also have the problem of events db ballooning growth (although we have some WAL-related tuning to do, which I've been playing with but it's being stubborn).

Here's my current proposal for how we might re-engineer some of it to at least make it nicer from where I stand:

Some kind of pruning of events db to keep them roughly in sync, probably just coupling with the splitstore prune and doing delete epoch>=X.
Regular consistency checks: perhaps we do this for each epoch minus finality—each block, we check the consistency of the block 900 epochs ago. This could be tightened with f3.
Implement the tipset from->to walk algorithm discussed in Exclude reverted events in the new events APIs #11770 for range queries so we don't have to rely on revert information coming out of the db. This is nice for our raw events API but probably not acceptable to the eth_ variants, so we still have revert hassles there.

Stebalien added kind/discussion Kind: Discussion area/api area/events labels Apr 4, 2024

Stebalien mentioned this issue Apr 4, 2024

Actor event entries do not retain order when passing through event index #11823

Closed

rvagg mentioned this issue Jun 5, 2024

fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset #11905

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events source of truth: db or receipts #11830

Events source of truth: db or receipts #11830

Stebalien commented Apr 4, 2024

rvagg commented May 2, 2024

Stebalien commented May 2, 2024

raulk commented Jun 5, 2024

Stebalien commented Jun 5, 2024

Stebalien commented Jun 5, 2024

rvagg commented Jun 6, 2024

Events source of truth: db or receipts #11830

Events source of truth: db or receipts #11830

Comments

Stebalien commented Apr 4, 2024

rvagg commented May 2, 2024

Stebalien commented May 2, 2024

raulk commented Jun 5, 2024

Stebalien commented Jun 5, 2024

Stebalien commented Jun 5, 2024

rvagg commented Jun 6, 2024