Orphan blocks not removed from Block table #114

erikd · 2020-05-29T22:36:37Z

The validate program reports:

All transactions for blocks in epoch 195 are present: Failed on block no 4224831: 
expected tx count of 2 but got 3

and a simple query:

select * from block where block_no = 4224831 ;

results in two rows where only one was expected.

# select id, slot_no, block_no,previous,tx_count,time from block where 
    block_no = 4224831 ; 
   id    | slot_no | block_no | previous | tx_count |        time         
---------+---------+----------+----------+----------+---------------------
 4265744 | 4226980 |  4224831 |  4265742 |        2 | 2020-05-29 08:58:11
 4265743 | 4226979 |  4224831 |  4265742 |        3 | 2020-05-29 08:57:51

The block with the lower slot number was obviously orphaned, but the fact that no later block extended that chain was not detected and hence, the block not removed from the database.

Blocks that are orphaned like this should either be removed from the database, or avoided in the validate process.

The text was updated successfully, but these errors were encountered:

erikd · 2020-05-29T23:24:03Z

The simplest solution to this is is to remove blocks which have been orphaned like this.

As an alternative to removing them, we could move them to an separate orphaned_blocks table but then we need to decide what we want to do with the txs attached to that block. My initial feeling is they should just be ignored/removed.

Having an orphaned_block table may be useful when we switch to Praos, but can be easily implemented even now when db-sync really only supports Byron.

gufmar · 2020-05-30T04:37:55Z

My very first thought was also: clean the orphaned block out, but keep it somewhere.

It could become even an option for the node to store all detected forks including all blocks from the abandoned side arm in this table. Purging after x slots is simple. diagnostics and tracing of what's happened is much easier then.

gufmar · 2020-05-30T04:53:48Z

The question about lost TXs is interesting. As a quick idea, without knowing if it's cryptographically possible and 100% safe:

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.
(broadcast shouldn't be necessary because many other pools detect the orphaned block with its TXs and reinsert it into their mempool from that source)

If the utxo owner meanwhile (until forks are resolved and orphaned blocks detected) submitted a second attempt, the previous TX becomes invalid. If instead the tx is still valid, it will automatically appear in a new valid block again.

Wallets need to be able to handle a situation then, where they might see their tx in a block, who then is orphaned, and some blocks later the same tx is placed in another, then hopefully valid block.

erikd · 2020-05-30T05:26:53Z

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.

Its important to note that db-sync is separate from the node and simply follows the node's chain.
Specifically, db-sync does not maintain its own copy of the mempool. In the case I detected, the transactions in the orphaned block were almost certainly included in later non-orphaned blocks, so db-sync should just drop them.

The wallet of course has different behavior from db-sync.

gufmar · 2020-05-30T05:57:19Z

If the TXs are almost certainly processed by other block producers then ofc there is no need to "re-inject" them with some tool from db to node's mempool.
Question is if this is the case in all slot/height battle variants. If not next question is if it's useful (intended) to let one of the nodes who detect unprocessed TXs in an orphaned block, re-insert them into his mempool, based on the db-sync database

erikd · 2020-05-30T06:17:04Z

db-sync should really only be used to view what has happened in the past. The node itself will pull transactions out of orphaned blocks and put them back in its own mempool. db-sync is not intended to be part of that.

However maintaining a list of block orphaned within the last 100 or so slots may be useful for debugging.

dcoutts · 2020-05-30T12:45:50Z

I think the block should be removed on rollback, and we should use cascading deletes to delete everything that is logically within the deleted blocks.

There is no point trying to preserve blocks that have been rolled back. This is not a monitoring tool for monitoring block propagation within the network. It is only observing at one place, and as a node client, so it would be very unreliable when used in that role.

papacarp · 2020-06-10T21:47:15Z

I agree we should keep db-sync as pure as possible a reflection of actual chain state.

However, it would greatly simplify my life if ALL blocks made it into db-sync, and then were deleted on on rollbacks. That way I can trigger on the postgres delete and do what I want with the data.

kevinhammond · 2020-06-11T12:38:35Z

This is needed by other clients and Sam says it is critical for any db-sync release, including Testnet.

kevinhammond · 2020-06-11T12:45:43Z

It seems like rollbacks aren't working as expected. We see multiple blocks in the DB with the same block height. Here's an example (on mainnet):

cexplorer=# SELECT * FROM block WHERE block_no=4224831;
id | hash | slot_no | block_no | previous | merkel_root | slot_leader | size | epoch_no | time | tx_count
---------+--------------------------------------------------------------------+---------+----------+----------+--------------------------------------------------------------------+-------------+------+----------+---------------------+----------
4225044 | \x941a71094c7b10243845a82e53dc45959d8fde5f6d87e463efe660aa9611b965 | 4226979 | 4224831 | 4225043 | \x95549f5fcfc370eb4b24c981a6053f909bdef6418ce896828c6be18fa31ab406 | 6 | 1731 | 195 | 2020-05-29 08:57:51 | 3
4225045 | \x275ee8b9ae441e6e32e1f7f36290e6a048722619d91195773a07b06682418209 | 4226980 | 4224831 | 4225043 | \x6eb04497431d77657a94b0eee70dae59e7403980d4fc9d06ba5295436b8f0f54 | 7 | 1418 | 195 | 2020-05-29 08:58:11 | 2
(2 rows)

Rollback messages from chainsync were not being handled correctly which meant that orphaned blocks were not removed. When orphaned blocks were not removed from the block table that table could end up with two or more rows with the same block number. Closes: #114

erikd mentioned this issue Jun 10, 2020

Multiple blocks with same block height #129

Closed

kevinhammond added bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this; labels Jun 11, 2020

erikd changed the title ~~Validate failure due to invalid tx_count~~ Orphan blocks not removed from Block table Jun 12, 2020

tatyanavych assigned erikd Jun 16, 2020

erikd closed this as completed in e2af23c Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orphan blocks not removed from Block table #114

Orphan blocks not removed from Block table #114

erikd commented May 29, 2020 •

edited

erikd commented May 29, 2020 •

edited

gufmar commented May 30, 2020

gufmar commented May 30, 2020

erikd commented May 30, 2020 •

edited

gufmar commented May 30, 2020

erikd commented May 30, 2020 •

edited

dcoutts commented May 30, 2020

papacarp commented Jun 10, 2020

kevinhammond commented Jun 11, 2020

kevinhammond commented Jun 11, 2020

Orphan blocks not removed from Block table #114

Orphan blocks not removed from Block table #114

Comments

erikd commented May 29, 2020 • edited

erikd commented May 29, 2020 • edited

gufmar commented May 30, 2020

gufmar commented May 30, 2020

erikd commented May 30, 2020 • edited

gufmar commented May 30, 2020

erikd commented May 30, 2020 • edited

dcoutts commented May 30, 2020

papacarp commented Jun 10, 2020

kevinhammond commented Jun 11, 2020

kevinhammond commented Jun 11, 2020

erikd commented May 29, 2020 •

edited

erikd commented May 29, 2020 •

edited

erikd commented May 30, 2020 •

edited

erikd commented May 30, 2020 •

edited