Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphan blocks not removed from Block table #114

Closed
erikd opened this issue May 29, 2020 · 10 comments
Closed

Orphan blocks not removed from Block table #114

erikd opened this issue May 29, 2020 · 10 comments
Assignees
Labels
bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this;

Comments

@erikd
Copy link
Contributor

erikd commented May 29, 2020

The validate program reports:

All transactions for blocks in epoch 195 are present: Failed on block no 4224831: 
expected tx count of 2 but got 3

and a simple query:

select * from block where block_no = 4224831 ; 

results in two rows where only one was expected.

# select id, slot_no, block_no,previous,tx_count,time from block where 
    block_no = 4224831 ; 
   id    | slot_no | block_no | previous | tx_count |        time         
---------+---------+----------+----------+----------+---------------------
 4265744 | 4226980 |  4224831 |  4265742 |        2 | 2020-05-29 08:58:11
 4265743 | 4226979 |  4224831 |  4265742 |        3 | 2020-05-29 08:57:51

The block with the lower slot number was obviously orphaned, but the fact that no later block extended that chain was not detected and hence, the block not removed from the database.

Blocks that are orphaned like this should either be removed from the database, or avoided in the validate process.

@erikd
Copy link
Contributor Author

erikd commented May 29, 2020

The simplest solution to this is is to remove blocks which have been orphaned like this.

As an alternative to removing them, we could move them to an separate orphaned_blocks table but then we need to decide what we want to do with the txs attached to that block. My initial feeling is they should just be ignored/removed.

Having an orphaned_block table may be useful when we switch to Praos, but can be easily implemented even now when db-sync really only supports Byron.

@gufmar
Copy link
Contributor

gufmar commented May 30, 2020

My very first thought was also: clean the orphaned block out, but keep it somewhere.

It could become even an option for the node to store all detected forks including all blocks from the abandoned side arm in this table. Purging after x slots is simple. diagnostics and tracing of what's happened is much easier then.

@gufmar
Copy link
Contributor

gufmar commented May 30, 2020

The question about lost TXs is interesting. As a quick idea, without knowing if it's cryptographically possible and 100% safe:

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.
(broadcast shouldn't be necessary because many other pools detect the orphaned block with its TXs and reinsert it into their mempool from that source)

If the utxo owner meanwhile (until forks are resolved and orphaned blocks detected) submitted a second attempt, the previous TX becomes invalid. If instead the tx is still valid, it will automatically appear in a new valid block again.

Wallets need to be able to handle a situation then, where they might see their tx in a block, who then is orphaned, and some blocks later the same tx is placed in another, then hopefully valid block.

@erikd
Copy link
Contributor Author

erikd commented May 30, 2020

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.

Its important to note that db-sync is separate from the node and simply follows the node's chain.
Specifically, db-sync does not maintain its own copy of the mempool. In the case I detected, the transactions in the orphaned block were almost certainly included in later non-orphaned blocks, so db-sync should just drop them.

The wallet of course has different behavior from db-sync.

@gufmar
Copy link
Contributor

gufmar commented May 30, 2020

If the TXs are almost certainly processed by other block producers then ofc there is no need to "re-inject" them with some tool from db to node's mempool.
Question is if this is the case in all slot/height battle variants. If not next question is if it's useful (intended) to let one of the nodes who detect unprocessed TXs in an orphaned block, re-insert them into his mempool, based on the db-sync database

@erikd
Copy link
Contributor Author

erikd commented May 30, 2020

db-sync should really only be used to view what has happened in the past. The node itself will pull transactions out of orphaned blocks and put them back in its own mempool. db-sync is not intended to be part of that.

However maintaining a list of block orphaned within the last 100 or so slots may be useful for debugging.

@dcoutts
Copy link
Contributor

dcoutts commented May 30, 2020

I think the block should be removed on rollback, and we should use cascading deletes to delete everything that is logically within the deleted blocks.

There is no point trying to preserve blocks that have been rolled back. This is not a monitoring tool for monitoring block propagation within the network. It is only observing at one place, and as a node client, so it would be very unreliable when used in that role.

@papacarp
Copy link

I agree we should keep db-sync as pure as possible a reflection of actual chain state.

However, it would greatly simplify my life if ALL blocks made it into db-sync, and then were deleted on on rollbacks. That way I can trigger on the postgres delete and do what I want with the data.

@kevinhammond kevinhammond added bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this; labels Jun 11, 2020
@kevinhammond
Copy link
Contributor

This is needed by other clients and Sam says it is critical for any db-sync release, including Testnet.

@kevinhammond
Copy link
Contributor

It seems like rollbacks aren't working as expected. We see multiple blocks in the DB with the same block height. Here's an example (on mainnet):

cexplorer=# SELECT * FROM block WHERE block_no=4224831;
id | hash | slot_no | block_no | previous | merkel_root | slot_leader | size | epoch_no | time | tx_count
---------+--------------------------------------------------------------------+---------+----------+----------+--------------------------------------------------------------------+-------------+------+----------+---------------------+----------
4225044 | \x941a71094c7b10243845a82e53dc45959d8fde5f6d87e463efe660aa9611b965 | 4226979 | 4224831 | 4225043 | \x95549f5fcfc370eb4b24c981a6053f909bdef6418ce896828c6be18fa31ab406 | 6 | 1731 | 195 | 2020-05-29 08:57:51 | 3
4225045 | \x275ee8b9ae441e6e32e1f7f36290e6a048722619d91195773a07b06682418209 | 4226980 | 4224831 | 4225043 | \x6eb04497431d77657a94b0eee70dae59e7403980d4fc9d06ba5295436b8f0f54 | 7 | 1418 | 195 | 2020-05-29 08:58:11 | 2
(2 rows)

@erikd erikd changed the title Validate failure due to invalid tx_count Orphan blocks not removed from Block table Jun 12, 2020
erikd added a commit that referenced this issue Jun 12, 2020
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
erikd added a commit that referenced this issue Jun 12, 2020
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
erikd added a commit that referenced this issue Jun 15, 2020
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
erikd added a commit that referenced this issue Jun 16, 2020
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
erikd added a commit that referenced this issue Jun 16, 2020
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
@erikd erikd closed this as completed in e2af23c Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this;
Projects
None yet
Development

No branches or pull requests

5 participants