New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 'value out of range' RangeDefect in block witness generation. #2072
Conversation
…/transactions during DOS period.
@@ -32,7 +32,7 @@ type | |||
MultiKeysRef* = ref MultiKeys | |||
|
|||
Group* = object | |||
first*, last*: int16 | |||
first*, last*: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even with the clearing of the cache it would still be possible to hit the int16.high
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually in the scenario which I tested we don't hit the limit when processing only a single block or clearing the cache between blocks. Having said that, I think it's worth to leave this fix in so that we don't need to worry about it in the future. 32767 isn't that large and I wonder if there might be any large blocks which could still trigger the issue when processing only a single block.
@@ -117,7 +117,7 @@ proc procBlkEpilogue(vmState: BaseVMState; | |||
if vmState.generateWitness: | |||
db.collectWitnessData() | |||
let clearEmptyAccount = vmState.determineFork >= FkSpurious | |||
db.persist(clearEmptyAccount, ClearCache in vmState.flags) | |||
db.persist(clearEmptyAccount, clearCache = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangko Can you please review this change and let me know if this might cause any issues? I noticed that the AccountCache was not being cleared between blocks when persisting multiple blocks at once, because the vmState.flags never had the ClearCache flag set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When syncing contiguous blocks, clearing the cache will degrade performance if the contiguous blocks contains similar set of active accounts. Looks like after several times of refactoring, somehow the flag never set. So it's a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that makes sense. In that case what is the suggested fix? Should I set the vm flag for the last block when processing a list of blocks in the persistBlocks function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the flag should be set by the synchronizer(snap, full, or beacon sync) because one of their responsibility is to trigger block execution. It just we have not make it that way. Probably you can open an issue so we don't forget to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll revert that part of my change and create an issue for clearing the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an issue for it. See here: #2083
@@ -694,6 +694,10 @@ proc makeMultiKeys*(ac: AccountsCache): MultiKeysRef = | |||
result.add(k, v.codeTouched, multiKeys(v.storageKeys)) | |||
result.sort() | |||
|
|||
# reset the witness cache after collecting the witness data | |||
# so it is clean before executing the next block | |||
ac.witnessCache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clear the witnessData just after collecting the keys, not during the AccountCache.persist because otherwise the data would be cleared before being accessed.
Fix 'value out of range' RangeDefect caused by large/expensive blocks/transactions during DOS period.
See here:
INF 2024-03-13 15:58:13.545+08:00 REQUEST HEADER tid=125352 blockNumber=2306432 txs=32 /home/user/development/status-im/nimbus-eth1/premix/persist.nim(139) persist /home/user/development/status-im/nimbus-eth1/premix/persist.nim(109) main /home/user/development/status-im/nimbus-eth1/nimbus/core/chain/persist_blocks.nim(148) persistBlocksImpl /home/user/development/status-im/nimbus-eth1/nimbus/evm/state.nim(305) buildWitness /home/user/development/status-im/nimbus-eth1/stateless/witness_from_tree.nim(363) buildWitness /home/user/development/status-im/nimbus-eth1/stateless/multi_keys.nim(112) initGroup /home/user/development/status-im/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(48) raiseRangeErrorI /home/user/development/status-im/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(54) sysFatal Error: unhandled exception: value out of range: 52105 notin -32768 .. 32767 [RangeDefect]
Update:
It turns out that this problem actually occurred because the witness data in the account cache was not being cleared between blocks even when clear cache was set to true and so the data structure was growing when persisting multiple blocks at once.
To reproduce we can run the persist tool with the --numcommits flag set to a large number such as 1000.