Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'value out of range' RangeDefect in block witness generation. #2072

Merged
merged 4 commits into from Mar 19, 2024

Conversation

web3-developer
Copy link
Contributor

@web3-developer web3-developer commented Mar 13, 2024

Fix 'value out of range' RangeDefect caused by large/expensive blocks/transactions during DOS period.

See here:
INF 2024-03-13 15:58:13.545+08:00 REQUEST HEADER tid=125352 blockNumber=2306432 txs=32 /home/user/development/status-im/nimbus-eth1/premix/persist.nim(139) persist /home/user/development/status-im/nimbus-eth1/premix/persist.nim(109) main /home/user/development/status-im/nimbus-eth1/nimbus/core/chain/persist_blocks.nim(148) persistBlocksImpl /home/user/development/status-im/nimbus-eth1/nimbus/evm/state.nim(305) buildWitness /home/user/development/status-im/nimbus-eth1/stateless/witness_from_tree.nim(363) buildWitness /home/user/development/status-im/nimbus-eth1/stateless/multi_keys.nim(112) initGroup /home/user/development/status-im/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(48) raiseRangeErrorI /home/user/development/status-im/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(54) sysFatal Error: unhandled exception: value out of range: 52105 notin -32768 .. 32767 [RangeDefect]

Update:
It turns out that this problem actually occurred because the witness data in the account cache was not being cleared between blocks even when clear cache was set to true and so the data structure was growing when persisting multiple blocks at once.

To reproduce we can run the persist tool with the --numcommits flag set to a large number such as 1000.

@web3-developer web3-developer changed the title Fix 'value out of range' RangeDefect caused by large/expensive block. Fix 'value out of range' RangeDefect in block witness generation. Mar 13, 2024
@@ -32,7 +32,7 @@ type
MultiKeysRef* = ref MultiKeys

Group* = object
first*, last*: int16
first*, last*: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even with the clearing of the cache it would still be possible to hit the int16.high?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually in the scenario which I tested we don't hit the limit when processing only a single block or clearing the cache between blocks. Having said that, I think it's worth to leave this fix in so that we don't need to worry about it in the future. 32767 isn't that large and I wonder if there might be any large blocks which could still trigger the issue when processing only a single block.

@@ -117,7 +117,7 @@ proc procBlkEpilogue(vmState: BaseVMState;
if vmState.generateWitness:
db.collectWitnessData()
let clearEmptyAccount = vmState.determineFork >= FkSpurious
db.persist(clearEmptyAccount, ClearCache in vmState.flags)
db.persist(clearEmptyAccount, clearCache = true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jangko Can you please review this change and let me know if this might cause any issues? I noticed that the AccountCache was not being cleared between blocks when persisting multiple blocks at once, because the vmState.flags never had the ClearCache flag set.

Copy link
Contributor

@jangko jangko Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When syncing contiguous blocks, clearing the cache will degrade performance if the contiguous blocks contains similar set of active accounts. Looks like after several times of refactoring, somehow the flag never set. So it's a regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that makes sense. In that case what is the suggested fix? Should I set the vm flag for the last block when processing a list of blocks in the persistBlocks function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the flag should be set by the synchronizer(snap, full, or beacon sync) because one of their responsibility is to trigger block execution. It just we have not make it that way. Probably you can open an issue so we don't forget to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll revert that part of my change and create an issue for clearing the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an issue for it. See here: #2083

@@ -694,6 +694,10 @@ proc makeMultiKeys*(ac: AccountsCache): MultiKeysRef =
result.add(k, v.codeTouched, multiKeys(v.storageKeys))
result.sort()

# reset the witness cache after collecting the witness data
# so it is clean before executing the next block
ac.witnessCache.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clear the witnessData just after collecting the keys, not during the AccountCache.persist because otherwise the data would be cleared before being accessed.

@web3-developer web3-developer merged commit a147ff7 into master Mar 19, 2024
16 checks passed
@web3-developer web3-developer deleted the fix-witness-building-during-dos branch March 19, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants