Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Implementing support for emptyBlockPeriodSeconds in QBFT (Issue #3810) #6965

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

amsmota
Copy link

@amsmota amsmota commented Apr 18, 2024

PR description

Implementing emptyblockperiodseconds for producing empty blocks at a specific interval independently of the value of the existing blockperiodseconds setting

Fixed Issue(s)

#3810

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

@amsmota
Copy link
Author

amsmota commented Apr 19, 2024

Hi @siladu and @matthew1001 can youse please review this one?

Cheers.

Copy link
Contributor

@matthew1001 matthew1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments. Was there any follow up to my earlier comment about what the behaviour should be if a new TX arrives during an empty block interval? I'm not sure waiting the entire remaining empty block interval to include the TX in a block is the right thing to do.

@matthew1001
Copy link
Contributor

Hi @siladu and @matthew1001 can youse please review this one?

Cheers.

Thanks for updating the PR @amsmota I've left a few more comments, the main one being about what the behaviour is when a new TX arrives during the empty block period?

@amsmota
Copy link
Author

amsmota commented Apr 22, 2024

Left a few more comments. Was there any follow up to my earlier comment about what the behaviour should be if a new TX arrives during an empty block interval? I'm not sure waiting the entire remaining empty block interval to include the TX in a block is the right thing to do.

Hi @matthew1001 , thanks for that. In relation to that comment above I did reply in the previous PR, as per my tests everithing looks OK, please look there the test log.

#6944 (comment)

I'll review your new comments ASAP too.

Cheers.

Signed-off-by: amsmota <amsmota@gmail.com>
@macfarla macfarla mentioned this pull request Apr 24, 2024
Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

I've just pulled your latest commits on the PR @amsmota and I'm still seeing the behaviour of a new block not being produced if a TX arrives during the empty block period. E.g. if I have 2 second block interval, 30 second empty block interval the behaviour is:

  1. Block 1 - empty
    • (30 seconds)
  2. Block 2 - empty
    • (30 seconds)
  3. Block 3 - empty
    • (10 seconds pass, then I submit a new TX)
    • (20 more seconds until...)
  4. Block 4 - 1 TX

I'm not sure that's the right behaviour. I think it probably ought to be something like:

  1. Block 1 - empty
    • (30 seconds)
  2. Block 2 - empty
    • (30 seconds)
  3. Block 3 - empty
    • (10 seconds pass, then I submit a new TX)
    • block period timer of 2 seconds starts
    • (2 more seconds until...).
  4. Block 4 - 1 TX

It would be good to see what others think though. @siladu any thoughts?

@amsmota
Copy link
Author

amsmota commented Apr 24, 2024

It is not the behaviour I see in my tests. Did you saw the comment I mentioned above?

#3810 (comment)

2024-04-13 **15:10:57**.174+01:00 | QbftBesuControllerBuilder | Imported empty block #49 / 0 tx / 3 pending / 0 (0.0%) gas / (0x0ee851152f1a6faaa199d3f595234141c050966f7be46a65241f01908ce34d07)
2024-04-13 **15:11:27**.233+01:00 | PeerDiscoveryAgent | Writing node record to disk. NodeRecord{seq=2, publicKey=0x027c19f5309e268264c9a986704a2f24e01f466a541ef910f68cc831324ba1ee2c, udpAddress=Optional[/127.0.0.1:30303], tcpAddress=Optional[/127.0.0.1:30303], asBase64=-Je4QKHhLqyEBF13upUKqj-i_cALkD-_-h3yPWZgkdFdnsjZZZCyzpd63wDaWlVWriBpGu_JPciWtEhwgqy-XSiQp0YCg2V0aMfGhOcD3GaAgmlkgnY0gmlwhH8AAAGJc2VjcDI1NmsxoQJ8GfUwniaCZMmphnBKLyTgH0ZqVB75EPaMyDEyS6HuLIN0Y3CCdl-DdWRwgnZf, nodeId=0xc92329278474852a32bd72b721afbac4d309b7fa6e40862eaee847cc8f544045, customFields={tcp=30303, udp=30303, ip=0x7f000001, eth=[[0xe703dc66, 0x]], id=V4, secp256k1=0x027c19f5309e268264c9a986704a2f24e01f466a541ef910f68cc831324ba1ee2c}}
2024-04-13 **15:11:27**.258+01:00 | QbftBesuControllerBuilder | Imported empty block #50 / 0 tx / 3 pending / 0 (0.0%) gas / (0x4a43cebe16dc6a3c341cb92b109404580bb1bbe756cd2c855ecdc826d14ea9c1)
2024-04-13 **15:11:57**.359+01:00 | QbftBesuControllerBuilder | Imported empty block #51 / 0 tx / 3 pending / 0 (0.0%) gas / (0x02b83a1b6d33dcbafed7c6ceae9bd32cd757874151fb9232124681ac41befba1)
2024-04-13 **15:12:02**.377+01:00 | QbftBesuControllerBuilder | **Produced block #52** / 1 tx / 3 pending / 3,000,000 (63.8%) gas / (0x32dd8d703933b1de951c7f99d3a5e50b66e573494ecf8eafe003df578df1e204)
2024-04-13 **15:12:32**.236+01:00 | QbftBesuControllerBuilder | Imported empty block #53 / 0 tx / 3 pending / 0 (0.0%) gas / (0xc51bba3cc60545422f6f920827790ed5ba75306608e494501e970ae1cd84fbe9)

Look at non-empty block 52, it was produced after 5 seconds of the the last empty block and then the next one 30 seconds after (an empty-block). This was with

"blockperiodseconds" : 5,
"emptyblockperiodseconds" : 30

Can you show me the genesis config you used? And if you can post some logs please do.

Anyway, that test was 2 weeks ago, I'll repeat the test asap...

I've just pulled your latest commits on the PR @amsmota and I'm still seeing the behaviour of a new block not being produced if a TX arrives during the empty block period. E.g. if I have 2 second block interval, 30 second empty block interval the behaviour is:

  1. Block 1 - empty

    • (30 seconds)
  2. Block 2 - empty

    • (30 seconds)
  3. Block 3 - empty

    • (10 seconds pass, then I submit a new TX)
    • (20 more seconds until...)
  4. Block 4 - 1 TX

I'm not sure that's the right behaviour. I think it probably ought to be something like:

  1. Block 1 - empty

    • (30 seconds)
  2. Block 2 - empty

    • (30 seconds)
  3. Block 3 - empty

    • (10 seconds pass, then I submit a new TX)
    • block period timer of 2 seconds starts
    • (2 more seconds until...).
  4. Block 4 - 1 TX

It would be good to see what others think though. @siladu any thoughts?

Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

matthew1001 commented Apr 25, 2024

Sorry I couldn't find your previous comment, but that's great if that's the current behaviour. I think I pulled and re-built with your latest changes but I'll check. Here's the QBFT config I used:

"qbft" : {
      "blockperiodseconds" : 2,
      "emptyblockperiodseconds" : 30,
      "epochlength" : 30000,
      "requesttimeoutseconds" : 4
    }

I was using a single node to test though. It's possible that just using a single node gives different behaviour?

@amsmota
Copy link
Author

amsmota commented Apr 25, 2024

Sorry I couldn't find your previous comment, but that's great if that's the current behaviour. I think I pulled and re-built with your latest changes but I'll check. Here's the QBFT config I used:

"qbft" : {
      "blockperiodseconds" : 2,
      "emptyblockperiodseconds" : 30,
      "epochlength" : 30000,
      "requesttimeoutseconds" : 4
    }

I was using a single node to test though. It's possible that just using a single node gives different behaviour?

Please note that I'm not dismissing your findings, as I said the tests I did (my original post is in the Issue itself) were done 2 weeks ago, there were some merges and commits by now. I will test again as soon as I can, and I will test the single-node scenario. I suspect that won't behave as intended, at least considering that having my 4 nodes network with 1 node not correctly configured gave me all those issues I mentioned above.

My local network of 4 nodes was created following this page.

The rest of your comments are now all implemented too.

I see below those 4 workflows awaiting approval, what should I do to be able to run them?

@amsmota
Copy link
Author

amsmota commented Apr 25, 2024

Just tested again the emnpty/non-empty block times, it seems good to me. It's based on

  "blockperiodseconds": 10,
  "emptyblockperiodseconds": 30,

The block that contains 1 tx is the line 5: so the flow is

1: --> 30s --> 2: --> 30s --> 3: --> 30s --> 4: --> 10s --> 5: --> 30s --> 6: --> 30s --> 7: --> 30s --> 8:

1: 2024-04-25 16:22:10.096+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #88 / 0 tx / 0 pending / 0 (0.0%) gas / (0x77729663d055235bb84c8bd80425010dbce8dba6a348956d21352b9bfe30f3d8)
2: 2024-04-25 16:22:40.154+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #89 / 0 tx / 1 pending / 0 (0.0%) gas / (0x7224a81a9d0a53a23660d7103a859cbf09970a72f257ea4ce494ebe010e32e03)
3: 2024-04-25 16:23:10.148+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #90 / 0 tx / 0 om / 0 (0.0%) gas / (0x5a4d0eb71ff7ea9c31fb49df659a7b0c97a7ff4d8e62202a40de98ff52e01df3) in 0.005s. Peers: 3
4: 2024-04-25 16:23:40.123+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #91 / 0 tx / 0 om / 0 (0.0%) gas / (0xe11cf442d69007de17b954ddceaa527c1adad32ca35743d35ffd5ca120ef77af) in 0.001s. Peers: 3
5: 2024-04-25 16:23:50.138+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #92 / 1 tx / 0 om / 3,000,000 (63.8%) gas / (0x7825a0381ee19519db2880f0546af325b15fd417a777052e53b83dd9bd865ebc) in 0.009s. Peers: 3
6: 2024-04-25 16:24:20.189+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #93 / 0 tx / 0 pending / 0 (0.0%) gas / (0x3a2f5bdc9d1976d97165b453925d9635d55e0165138ec77be9286dae671b937e)
7: 2024-04-25 16:24:50.183+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #94 / 0 tx / 0 pending / 0 (0.0%) gas / (0x032cdc8694057113331d3a6e4a2e743dcb4bfe77f2c9702725e8036b28836bea)
8: 2024-04-25 16:25:20.230+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #95 / 0 tx / 0 pending / 0 (0.0%) gas / (0x6cf6facb3fd887b490ab140cf76335b92eb7b503029c3c8543908123a2634ccc)

Anyway, to be sure is better if somebody else confirms this data.

@amsmota
Copy link
Author

amsmota commented Apr 25, 2024

One thing I notice in those logs above, in the previous logs I copied from tests all(most) blocks, profuced or imported, appeared on the QbftBesuControllerBuilder output, which is not the case now:

2024-04-25 16:22:10.096+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #88 / 0 tx / 0 pending / 0 (0.0%) gas / (0x77729663d055235bb84c8bd80425010dbce8dba6a348956d21352b9bfe30f3d8)
2024-04-25 16:22:40.154+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #89 / 0 tx / 1 pending / 0 (0.0%) gas / (0x7224a81a9d0a53a23660d7103a859cbf09970a72f257ea4ce494ebe010e32e03)
2024-04-25 16:23:10.148+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #90 / 0 tx / 0 om / 0 (0.0%) gas / (0x5a4d0eb71ff7ea9c31fb49df659a7b0c97a7ff4d8e62202a40de98ff52e01df3) in 0.005s. Peers: 3
2024-04-25 16:23:40.123+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #91 / 0 tx / 0 om / 0 (0.0%) gas / (0xe11cf442d69007de17b954ddceaa527c1adad32ca35743d35ffd5ca120ef77af) in 0.001s. Peers:  3

I wonder if this is expected ot it may indicate some issue...

I actually added some BPs on QbftBesuControllerBuilder and PersistBlockTask to check the upflow but I couldn't see any logic on the diference...

@matthew1001
Copy link
Contributor

I see below those 4 workflows awaiting approval, what should I do to be able to run them?

I've just approved them to run, so they're all going now

@matthew1001
Copy link
Contributor

Anyway, to be sure is better if somebody else confirms this data.

I'll pull your latest commits today and try with a 1 node and 4 node network, but given your findings I expected it's working fine now.

@amsmota
Copy link
Author

amsmota commented Apr 29, 2024

Anyway, to be sure is better if somebody else confirms this data.

I'll pull your latest commits today and try with a 1 node and 4 node network, but given your findings I expected it's working fine now.

I'm afraid I found a issue when testing with one node only but unfortunately I couldn't replicate again yet. It seems it happens with 2 non-empty blocks in a row:

...2024-04-29 13:04:30.025+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #72 / 0 tx / 0 pending / 0 (0.0%) gas / (0x0ed61b79ce98b122aca8c20088d514c0e2923d1047844fedabb031e6c2dc19f6)
...2024-04-29 13:05:00.030+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #73 / 0 tx / 0 pending / 0 (0.0%) gas / (0x76ca568e1d313d4f0e4b2e46493b0bf05948a0c18b7f01445e6dbcaf211ca9c2)
***2024-04-29 13:05:10.047+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #74 / 1 tx / 0 pending / 3,000,000 (63.8%) gas / (0xa81a1f4d165a8f13c70e2837e29b04b2434027ca4606e5d740ea9035e3006494)
***2024-04-29 13:05:40.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #75 / 1 tx / 0 pending / 3,000,000 (63.8%) gas / (0x1fda1fa7735207af755385f4a6d3686b33154e0621c754427e401ec58f8c847d)
...2024-04-29 13:06:10.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #76 / 0 tx / 0 pending / 0 (0.0%) gas / (0x5313c0d4ab90ef06698758268d9e5d1f08fbea769c86f7901733410b03e34c3b)
...2024-04-29 13:06:40.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #77 / 0 tx / 0 pending / 0 (0.0%) gas / (0xcc5b47a2528df351db18a8500c30a8a554a45fa97170db1ae5034ce8f254ff19)

Block 74 is created correctly but the next one is not, it should take 10 seconds and it took 30... It's this what you observed @matthew1001 ?

I'm testing with remix and metamask, and it's hard to send 2 transactions in a row, is there a more expedite way to do it?

Cheers.

Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

matthew1001 commented Apr 29, 2024

I've just pulled your latest changes and re-tested and I do see the same issue still. I've captured some log snippets of Besu with timestamps and a hardhat script with timestamps to can see the timings. This is a 4 validator network with 2 second block period, 30 second empty block period:

Here are the Besu logs:

2024-04-29 17:20:05.062+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #644 / 0 tx / 0 pending / 0 (0.0%) gas / (0xedba0be8a01e17273fb5f452e8098db74d0e503edc264615b2db58a1b64602e5)
2024-04-29 17:20:35.084+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #645 / 0 tx / 0 pending / 0 (0.0%) gas / (0xc56079c814d91d7098394036a7a4dd6d55353627b7b4f442e69b418d108e5d06)
2024-04-29 17:21:05.281+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #646 / 10 tx / 0 pending / 19,048,280 (0.0%) gas / (0x11e38bc403d73d7657580365d334f90560c277768924067f14ce3a36695f7e45)

Here is the hardhat script which deploys 10 contracts and prints the timestamp before and after:

Mon Apr 29 17:20:39 BST 2024
Next nonce (latest):  351
Deployed contract. Address = 0xf7ED3b17A04A3bD2335b52D27821199C1B4e597E
Deployed contract. Address = 0xcF2D1dD61af473529ee0b539D7d0b14290Eb50Ad
Deployed contract. Address = 0x05B59cdc7244e378E9d75b0aF3BbB211ffC3d9D9
Deployed contract. Address = 0x1B4f02a84534a6576844607b3c8E4e60aF311B38
Deployed contract. Address = 0x66EA9eF99fc1E3650712c4194090B0C65128A58A
Deployed contract. Address = 0x6AF6D200c6047dab80420ABfab8bb09004A8C1A8
Deployed contract. Address = 0x4de391161FaCe36eAb1135368267b74490119329
Deployed contract. Address = 0x19f86338cE087E7D4cE087Cc25B8B07ED810CEF2
Deployed contract. Address = 0x3849a4B696Ea143D59DFbcC4AaF32ad47Fa59986
Deployed contract. Address = 0x3a273e5976409598ABc79c2154d78b62259cB438
Next nonce (latest):  351
Next nonce (pending):  361
Mon Apr 29 17:20:44 BST 2024

So by 17:20:44 all 10 transactions are in the TX pool. But the next block isn't until 17:21:05

@amsmota
Copy link
Author

amsmota commented Apr 30, 2024

Hi @matthew1001 , that seems different from what I saw in my one-node test, but it's hard for me to test correctly with only remix and metamask. I'm not very familiar with that hardhat, even if I;m installing right now.Can you send me that script so I can test the exact same way as you?

Thanks.

I've just pulled your latest changes and re-tested and I do see the same issue still. I've captured some log snippets of Besu with timestamps and a hardhat script with timestamps to can see the timings. This is a 4 validator network with 2 second block period, 30 second empty block period:

Here are the Besu logs:

2024-04-29 17:20:05.062+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #644 / 0 tx / 0 pending / 0 (0.0%) gas / (0xedba0be8a01e17273fb5f452e8098db74d0e503edc264615b2db58a1b64602e5)
2024-04-29 17:20:35.084+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #645 / 0 tx / 0 pending / 0 (0.0%) gas / (0xc56079c814d91d7098394036a7a4dd6d55353627b7b4f442e69b418d108e5d06)
2024-04-29 17:21:05.281+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #646 / 10 tx / 0 pending / 19,048,280 (0.0%) gas / (0x11e38bc403d73d7657580365d334f90560c277768924067f14ce3a36695f7e45)

Here is the hardhat script which deploys 10 contracts and prints the timestamp before and after:

Mon Apr 29 17:20:39 BST 2024
Next nonce (latest):  351
Deployed contract. Address = 0xf7ED3b17A04A3bD2335b52D27821199C1B4e597E
Deployed contract. Address = 0xcF2D1dD61af473529ee0b539D7d0b14290Eb50Ad
Deployed contract. Address = 0x05B59cdc7244e378E9d75b0aF3BbB211ffC3d9D9
Deployed contract. Address = 0x1B4f02a84534a6576844607b3c8E4e60aF311B38
Deployed contract. Address = 0x66EA9eF99fc1E3650712c4194090B0C65128A58A
Deployed contract. Address = 0x6AF6D200c6047dab80420ABfab8bb09004A8C1A8
Deployed contract. Address = 0x4de391161FaCe36eAb1135368267b74490119329
Deployed contract. Address = 0x19f86338cE087E7D4cE087Cc25B8B07ED810CEF2
Deployed contract. Address = 0x3849a4B696Ea143D59DFbcC4AaF32ad47Fa59986
Deployed contract. Address = 0x3a273e5976409598ABc79c2154d78b62259cB438
Next nonce (latest):  351
Next nonce (pending):  361
Mon Apr 29 17:20:44 BST 2024

So by 17:20:44 all 10 transactions are in the TX pool. But the next block isn't until 17:21:05

@amsmota
Copy link
Author

amsmota commented May 1, 2024

Here's a test I did today with a one-node network based on a 5/10 seconds intervals. Block 3 (the first non-empty) took `10 seconds when it should take 5. However, all the other non-empty blocks seem OK.

2024-05-01 19:44:41.377+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending /
2024-05-01 19:44:51.022+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending /
2024-05-01 19:45:01.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3 / 2 tx / 0 pending /
2024-05-01 19:45:06.019+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #4 / 1 tx / 0 pending /
2024-05-01 19:45:11.027+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #5 / 1 tx / 0 pending /
2024-05-01 19:45:16.042+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
2024-05-01 19:45:26.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #7 / 0 tx / 0 pending /
2024-05-01 19:45:36.031+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #8 / 0 tx / 0 pending /
2024-05-01 19:45:46.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #9 / 0 tx / 0 pending /
2024-05-01 19:45:51.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #10 / 2 tx / 0 pending /
2024-05-01 19:45:56.039+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #11 / 1 tx / 0 pending /
2024-05-01 19:46:06.026+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #12 / 0 tx / 0 pending /
2024-05-01 19:46:16.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #13 / 0 tx / 0 pending /
2024-05-01 19:46:26.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #14 / 0 tx / 0 pending /

I'll try again to check if that 1st non-empty block is reproducible...

@amsmota
Copy link
Author

amsmota commented May 2, 2024

Here's my latest test with 5/10 seconds, one-node network. I logged some messages to try and understand the issue.

2024-05-02 18:37:46.387+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending /

2024-05-02 18:37:46.402+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:37:51.018+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=2, Round=0}
2024-05-02 18:37:51.018+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:37:56.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=2, Round=0}
2024-05-02 18:37:56.044+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending /
CORRECT

2024-05-02 18:37:56.064+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:01.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=3, Round=0}
2024-05-02 18:38:01.063+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3 / 3 tx / 0 pending /
CORRECT

2024-05-02 18:38:01.079+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:06.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=4, Round=0}
2024-05-02 18:38:06.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #4 / 2 tx / 0 pending /
CORRECT

2024-05-02 18:38:06.041+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:11.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=5, Round=0}
2024-05-02 18:38:11.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:16.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=5, Round=0}
2024-05-02 18:38:16.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #5 / 0 tx / 0 pending /
CORRECT

2024-05-02 18:38:16.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:26.009+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:26.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:26.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:31.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:31.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:36.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:36.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:36.045+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:41.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=8, Round=0}
2024-05-02 18:38:41.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:46.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=8, Round=0}
2024-05-02 18:38:46.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #8 / 0 tx / 0 pending /
CORRECT

You'll notice that the 2 INCORRECT ones have one thing in common, the 2 QbftBlockHeightManager lines have the first saying Block has NO transactions and the second Block has transactions !!!

2024-05-02 18:38:16.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:26.009+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:26.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:26.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:31.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:31.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:36.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:36.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7 / 1 tx / 0 pending /
INCORRECT

So after staring at the screen and the occasionally banging my head on the table, I remembered the phrase of that Vulcan guy (or was it Sherlock Holmes?) "When you eliminate the impossible, whatever remains, however improbable, must be the truth" and the only explanation I can find is this:

AFAICS the way this works is:

  1. A BlockTimer is created
  2. When that timer expires the handleBlockTimerExpiry kicks in
  3. If the node is a Proposer, it send a ProposalMessage to the other nodes and at some point the block is added to the chain

The above is what's happening at the moment, with no emptyBlockPeriodSeconds... With it will be:

  1. A BlockTimer is created
  2. When that timer expires the handleBlockTimerExpiry kicks in
  3. If the node is a Proposer AND has TXs it send a ProposalMessage to the other nodes and at some point the block is added to the chain
  4. If the block HAS NO TXs
    • it checks if the emptyBlockPeriodSeconds also has expired
    • if yes, it sends a ProposalMessage to the other nodes and at some point the block is added to the chain
    • if not, it sets the BlockTimer to the emptyBlockPeriodSeconds
    • at some point the BlockTimer with the new period expires
    • and gets back to 1.

So, the only explanation for those INCORRECT blocks is that they got their transaction between the 5th and the 10th second after the previous block... It's only logical...

If this explanation is correct, then it's very bad because there is no way to correct it because, well, there's nothing to correct, it's just the way it is... There may be some solution but nothing that comes to mind now... Or I may be wrong, of course...

What do you guys think?

P.S. I wonder if we can create a PendingTransactionAddedListener in the QbftBlockHeightManager, that may do the trick...

@amsmota
Copy link
Author

amsmota commented May 7, 2024

Hi @siladu and @matthew1001 and everybody, I need some help here related to my last comment, at the moment I think this PR is working correctly (if my findings on my last post are correct), I have the idea of having a mempool listener in the QbftBlockHeightManager but I'm afraid that even if that works it'll add too much complexity.

Just to recall, the only explanation for the INCORRECT blocks is that the non-empty-bock was created after 10 seconds and not 5 is because the block got the transaction(s) between the 5th and the 10th second after the previous block...

Comment PLEASE...

@matthew1001
Copy link
Contributor

Thanks for doing some extra testing @amsmota At the moment I'm going to find it hard to put the time in to diagnose things in detail. I'm wondering if we want to consider putting the feature in documented as experimental since the basic behaviour seems to work and the exact behaviour in certain situations just needs sorting out. The main thing would be to ensure that we're confident nothing has regressed in the basic case (i.e. when the user hasn't set emptyblockperiodseconds).

@siladu what are your thoughts? @non-fungible-nelson do you have a view on whether this could be released as experimental to get more people trying it out and more focus over the coming releases on ironing out any remaining issues?

@amsmota
Copy link
Author

amsmota commented May 8, 2024

Thanks for doing some extra testing @amsmota At the moment I'm going to find it hard to put the time in to diagnose things in detail.

Yeah, I totally understand that, and I thank you for your valuable time. 💯

I'm wondering if we want to consider putting the feature in documented as experimental since the basic behaviour seems to work and the exact behaviour in certain situations just needs sorting out.

Well, I don't know what exactly implies being "experimental" but I agree with it...

The main thing would be to ensure that we're confident nothing has regressed in the basic case (i.e. when the user hasn't set emptyblockperiodseconds).

I'm pretty sure that there is no problem there, but looking at the code I do see that there are no provision or validation for that, I may change it to make it explicitly, and anyway I'm going to test that scenario more exhaustively

@siladu what are your thoughts? @non-fungible-nelson do you have a view on whether this could be released as experimental to get more people trying it out and more focus over the coming releases on ironing out any remaining issues?

That part of having more people trying it out I this it's crucial. Being tested by only a couple of persons is too limited, we really need a broader area of testing...

Anyway, I'm going to keep on testing, going to create a "standard" test to send transactions in a predefined way, not randomly like I'm doing now, and improve my 4 nodes network because it's very difficult to aggregate meaningful info from all nodes...

Thank youse all for your support.

@amsmota
Copy link
Author

amsmota commented May 9, 2024

In this test below, with 5/10 seconds, I sent a bunch of transactions with delay of 1 and 8 seconds to the one-node newtork, with this results:

image

As it shows, all blocks with txs sent with 1 second delay are created with the 5 seconds period, while all the ones with the 8 seconds delay are created with the 10 seconds period (ignore the first 2 blocks, it was probably due to some overhead in my script).

This seems to indicate what I said before. If a transaction is sent to the pool between 0-5 secs it will be correctly added, if it is added between 5-10 seconds it will be added after 10 seconds of tha last one. This seems OK, except for the fact that is we have 5/60 seconds to have a transaction sent on the 6th seconds and be added to a block 54 seconds after that it is too much time to wait...

I think the problem here is that the code is putting the cart before the horse, meaning that only after the block is created the code knows if it's empty or not. So the correct way to do it is to first peek into the mempool to check if there is ny transaction and then decide what to do...

So, considering that I never touched that part of the code, my question is, how can I inspect the Mempool from the QbftBlockHeightManager?

@amsmota
Copy link
Author

amsmota commented May 9, 2024

Here's what GoQuorum says about their implementation...

"transitions": [{
  "block": ...,
  "blockPeriodSeconds": 2
  "emptyBlockPeriodSeconds": 60
}]

In the preceding example, a block is produced every two seconds if there are transactions pending to be mined. If no transactions are pending, then blocks are not produced until after 60 seconds.

And then some examples that from block 14 onwards it dosen't make much sense to me...

https://docs.goquorum.consensys.io/configure-and-manage/configure/consensus-protocols/qbft#configure-empty-block-period

I did look at their code, but I'm afraid my Go is a little rust... (pun intended)

@siladu
Copy link
Contributor

siladu commented May 10, 2024

@siladu what are your thoughts? @non-fungible-nelson do you have a view on whether this could be released as experimental to get more people trying it out and more focus over the coming releases on ironing out any remaining issues?

@matthew1001 I am generally in favour of putting things behind an experimental feature, even if they aren't complete.

As you mentioned, that is contingent on no regressions when the feature is disabled:

The main thing would be to ensure that we're confident nothing has regressed in the basic case (i.e. when the user hasn't set emptyblockperiodseconds).

It would be good to have someone who is currently using QBFT in anger test this out and confirm the regression case at least. At the very least, we should test a multi-node network under adversarial conditions, e.g. bringing nodes down to cause a round change, making sure the round change timer is still working as expected (when this feature is disabled).


@amsmota here's an example of an experimental feature:

private static final String CHAIN_PRUNING_ENABLED_FLAG = "--Xchain-pruning-enabled";

Not much to it, basically just a feature flag with an X in front of it and hidden=true in the cli help

I'm not sure whether we have an example of an experimental genesis config key. Perhaps it is cleaner to use a CLI flag that you can completely remove later, rather than say XemptyBlockPeriodSeconds that you then have to promote to a production-ready version without the X.

@siladu
Copy link
Contributor

siladu commented May 10, 2024

@amsmota I've started the process of testing this as well. I've had a go at writing some acceptance criteria and started working them through here: https://hackmd.io/M2NOd3QKTfCN8Nb_wUNiow

@matthew1001

I'm still seeing the behaviour of a new block not being produced if a TX arrives during the empty block period.

I am also seeing this and concur that I don't think it's expected. From my tests (5 second blockPeriod, 30 second emptyBlockPeriod)...

  • [PASS] EmptyBlock -> tx @ < 5 seconds, produces block after 5 seconds
  • [???] EmptyBlock -> tx @ 6 seconds, produces block after 30 seconds (24 seconds after tx sent)
  • [PASS] EmptyBlock -> tx @ 29 seconds, produces block after 30 seconds (1 second after tx sent)

I think Besu should behave more like GoQuorum in this respect, i.e. once there is a tx in the pool, produce a block no later than the 5 second (non-empty) block period.
I haven't tested it recently, but I believe GoQuorum has a block building thread that is constantly running hence why the transaction appears to get built into a block as soon as it arrives, whereas Besu is relying on the timers.

I do remember this coming up during my draft implementation before, but don't remember what the outcome was, would need to retest my old code to see if and how I avoided it.

And then some examples that from block 14 onwards it dosen't make much sense to me...
docs.goquorum.consensys.io/configure-and-manage/configure/consensus-protocols/qbft#configure-empty-block-period

I believe the block 14 onwards timestamps are illustrating this behaviour, i.e. a block is produced as soon as the tx is received due to the GoQ's thread that runs a tx "mining" tight loop.

Things are starting to come back to me...I think a compromise in my code was to still run on the blockperiodseconds timer, but skipping until the emptyblockperiodseconds > blockperiodseconds. So if you were to log it out, in the empty block scenario you'd see a "skipping" log every 5 seconds and then produce a block at 30.

@amsmota
Copy link
Author

amsmota commented May 10, 2024

Things are starting to come back to me...I think a compromise in my code was to still run on a the blockperiodseconds timer, but skipping until the emptyblockperiodseconds > blockperiodseconds.

I've tried that approach, on a 5/30 sec block the the blockTimer will expire after 5 secs, then after 5 secs again, and again, and again, and again, and again, until 30 secs are gone and then will create a empty block.

The problem is that every 5 seconds if there are txs it should create a block, but as it is now there is no such check (inspect the mempool for txs), what it does now is just create the block and check if that new block is empty or not. So what happened was it was creating empty blocks every 5 seconds.

So if I knew how to do that – to inspect the mempool for txs – I think I can implement it that way. But I don't know how to do it...

@siladu
Copy link
Contributor

siladu commented May 10, 2024

Things are starting to come back to me...I think a compromise in my code was to still run on a the blockperiodseconds timer, but skipping until the emptyblockperiodseconds > blockperiodseconds.

I've tried that approach, on a 5/30 sec block the the blockTimer will expire after 5 secs, then after 5 secs again, and again, and again, and again, and again, until 30 secs are gone and then will create a empty block.

So if we are able to check for any txs using the 5 second blockperiodseconds, I would expect

EmptyBlock -> tx @ 6 seconds, produces block after 10 seconds (4 seconds after tx sent)

@amsmota
Copy link
Author

amsmota commented May 10, 2024

So if we are able to check for any txs using the 5 second blockperiodseconds, I would expect

EmptyBlock -> tx @ 6 seconds, produces block after 10 seconds (4 seconds after tx sent)

Sorry, I updated my comment above before I saw yours, I'm on my mobile only and it's hard to update things and check new stuff...

Yes, that is what I would expect too... 😁

@matthew1001
Copy link
Contributor

It would be good to have someone who is currently using QBFT in anger test this out and confirm the regression case at least. At the very least, we should test a multi-node network under adversarial conditions, e.g. bringing nodes down to cause a round change, making sure the round change timer is still working as expected (when this feature is disabled).

Could be a good use for #7023?

@siladu
Copy link
Contributor

siladu commented May 10, 2024

It would be good to have someone who is currently using QBFT in anger test this out and confirm the regression case at least. At the very least, we should test a multi-node network under adversarial conditions, e.g. bringing nodes down to cause a round change, making sure the round change timer is still working as expected (when this feature is disabled).

Could be a good use for #7023?

For sure! Does that exercise the round change timer when it brings nodes down? I think that's probably the most complicated part of the consensus mechanism so a good one to ensure is still working early on.

@siladu
Copy link
Contributor

siladu commented May 10, 2024

@amsmota

So if I knew how to do that – to inspect the mempool for txs – I think I can implement it that way. But I don't know how to do it...

I've just retested #4634 and indeed it does produce a block no later than blockperiodseconds after receiving the transaction. I don't inspect the mempool, rather build a block every blockperiodseconds, but only propose it if blockHasTransactions || currentTimestamp >= lastBlockTimestampPlusEmptyBlockPeriodSeconds.

IIRC, the problems I ran into with my implementation were to do with the round change timer in multi-node networks.

Comment on lines +234 to +236
case NOT_IMPORTED:
LOG.info("Block {} was not imported", block.toLogString());
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT_IMPORTED doesn't appear to be referenced anywhere else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it just to show on the logs... I wondered if it ever happened... It can be removed I think...

roundIdentifier,
qbftRound.getRoundIdentifier());
finalState.getRoundTimer().cancelTimer();
finalState.getBlockTimer().startEmptyBlockTimer(roundIdentifier, parentHeader);
Copy link
Contributor

@siladu siladu May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is executed following the blockperiodseconds block timer expiry being handled, so in the context of my test, 5+ seconds after the last block.
Here you are setting the block timer to emptyBlockPeriodSeconds instead of the blockPeriodSeconds, so despite having transactions in the pool, no block building (nor proposal) is attempted until emptyBlockPeriodSeconds (30 seconds) has passed since the last block, at the earliest.

Copy link
Contributor

@siladu siladu May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, GoQuorum QBFT fundamentally differs because of how GoQuorum builds blocks: it follows geth's approach of having a separate thread with a tight block building loop that quickly picks up new transactions in the pool.
Besu QBFT does not have this block building loop running in the background, instead it is event-driven using timers.

It would be a large change to Besu QBFT to support a tight block building loop, but I think for better liveness, ensuring we do something every blockperiodseconds, even if it's just skipping an empty block, is a better approach than waiting the full emptyBlockPeriodSeconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, consider if a user were to configure blockPeriodSeconds: 1 along with emptyBlockPeriodSeconds: 600 - I think the user would complain that they had to wait 10 minutes for their transactions to go through

Copy link
Author

@amsmota amsmota May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we already reached that conclusion. Please read my last comment. It was in reply to your early message...

@matthew1001
Copy link
Contributor

For sure! Does that exercise the round change timer when it brings nodes down? I think that's probably the most complicated part of the consensus mechanism so a good one to ensure is still working early on.

It definitely stops 1 of the 4 nodes for long enough that block production slows down due to the round expiring for the stopped node, and it also stalls the chain entirely by stopping another node before resuming both and checking that the chain continues once all are back up and running.

@amsmota
Copy link
Author

amsmota commented May 10, 2024

But that's what this PR code does, what it shouldn't do is the startEmptyBlockTimer that we already found out it causes the undesirable delay...

Yesterday I did change that to the "normal" startTimer, but that caused a succession of empty blocks being created. But I think I know why, I can't check it now (I don't have my laptop with me) but I'll check again as soon as I get home... I think the solution is to use a new BlockTimer or else manipulate the existing one, actually by reusing the startEmptyBlockTimer...

@siladu
Copy link
Contributor

siladu commented May 10, 2024

But that's what this PR code does, what it shouldn't do is the startEmptyBlockTimer that we already found out it causes the undesirable delay...

Yesterday I did change that to the "normal" startTimer, but that caused a succession of empty blocks being created. But I think I know why, I can't check it now (I don't have my laptop with me) but I'll check again as soon as I get home... I think the solution is to use a new BlockTimer or else manipulate the existing one, actually by reusing the startEmptyBlockTimer...

I would encourage you to check https://github.com/hyperledger/besu/pull/4634/files again since it already implemented the behaviour we're discussing here...maybe run through your tests with it to compare? (Note, I hard-coded blockPeriodSeconds: 5 and emptyBlockPeriodSeconds: 60).

I'll admit I've only performed today's tests on a single node network, so maybe I'm missing something for multi-node, but I feel it should work well on a single node before moving onto multi-node testing.

As I mentioned above, when I originally tested my PR, I ran into trouble once I got into the more complex test cases such as round changes.

@amsmota
Copy link
Author

amsmota commented May 10, 2024

I would encourage you to check https://github.com/hyperledger/besu/pull/4634/files again since it already implemented the behaviour we're discussing here...

My code is based on your code, I just moved some stuff to other places, like the BlockTimer. The calculations you did are in my code too (adding the parent timestamp to the emptyblockperiod). It was basically a copy&past and some renaming and in some cases moved to other parts of the code.

It doesn't behave as I initially expected because of the way the startEmptyBlockTimer is implemented. I didn't catch that issue before I did because I always tested with very small empty periods, always between 10 and 20, and because I was sending transactions individually, not in a script like I do now. That made the issue less apparent because there were less chances of that happening.

I'm pretty confident that after changing the startEmptyBlockTimer things will behave as expected.

@amsmota
Copy link
Author

amsmota commented May 10, 2024

Here's the result of the first tests with a 5/30 seconds on a one-node network using the script I had before sending Txs alternating between 1 and 10 seconds intervals. The left columns is the timestamps the Txs were sent.

                                 2024-05-10 19:42:46.729+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1
                                 2024-05-10 19:43:17.111+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2
10-05-2024T18:43:33.632053800Z   2024-05-10 19:43:37.104+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3         
10-05-2024T18:43:47.405594300Z   2024-05-10 19:43:52.074+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #4         
10-05-2024T18:43:53.564259900Z   2024-05-10 19:43:57.042+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #5         
10-05-2024T18:44:07.179650600Z   2024-05-10 19:44:12.071+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6         
10-05-2024T18:44:13.319288800Z   2024-05-10 19:44:17.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7         
10-05-2024T18:44:27.529839300Z   2024-05-10 19:44:32.058+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #8         
10-05-2024T18:44:33.259423900Z   2024-05-10 19:44:37.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #9         
10-05-2024T18:44:47.370699100Z   2024-05-10 19:44:52.054+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #10        
10-05-2024T18:44:53.481811200Z   2024-05-10 19:44:57.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #11        
10-05-2024T18:45:07.591676200Z   2024-05-10 19:45:12.067+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #12        
                                 2024-05-10 19:45:42.115+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #13
                                 2024-05-10 19:46:12.110+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #14

It looks good, all the blocks with Txs were created in the first 5 second frame after being sent.

Tomorrow I'll do some tests that mix empty and non-empty blocks, clean the code and commit.

@amsmota
Copy link
Author

amsmota commented May 11, 2024

Same test as before but periods of 5/30 and the script sending Txs after a random numbers of seconds between 0 and 60:

                                            |  2024-05-11 12:03:56.466+01:00 | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending
                                            |  2024-05-11 12:04:26.107+01:00 | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending
11-05-2024 11:04:33:::43 	Block:  3   |  2024-05-11 12:04:36.075+01:00 | QbftBesuControllerBuilder | Produced       block #3 / 1 tx / 0 pending
                                            |  2024-05-11 12:05:06.122+01:00 | QbftBesuControllerBuilder | Produced empty block #4 / 0 tx / 0 pending
11-05-2024 11:05:19:::35 	Block:  5   |  2024-05-11 12:05:21.044+01:00 | QbftBesuControllerBuilder | Produced       block #5 / 1 tx / 0 pending
                                            |  2024-05-11 12:05:51.086+01:00 | QbftBesuControllerBuilder | Produced empty block #6 / 0 tx / 0 pending
11-05-2024 11:05:56:::8 	Block:  7   |  2024-05-11 12:06:01.044+01:00 | QbftBesuControllerBuilder | Produced       block #7 / 1 tx / 0 pending
11-05-2024 11:06:09:::51 	Block:  8   |  2024-05-11 12:06:16.066+01:00 | QbftBesuControllerBuilder | Produced       block #8 / 1 tx / 0 pending
                                            |  2024-05-11 12:06:46.114+01:00 | QbftBesuControllerBuilder | Produced empty block #9 / 0 tx / 0 pending
11-05-2024 11:07:07:::56 	Block:  10  |  2024-05-11 12:07:11.093+01:00 | QbftBesuControllerBuilder | Produced       block #10 / 1 tx / 0 pending
                                            |  2024-05-11 12:07:41.079+01:00 | QbftBesuControllerBuilder | Produced empty block #11 / 0 tx / 0 pending
11-05-2024 11:08:07:::35 	Block:  12  |  2024-05-11 12:08:11.107+01:00 | QbftBesuControllerBuilder | Produced       block #12 / 1 tx / 0 pending
                                            |  2024-05-11 12:08:41.085+01:00 | QbftBesuControllerBuilder | Produced empty block #13 / 0 tx / 0 pending
11-05-2024 11:08:46:::8 	Block:  14  |  2024-05-11 12:08:51.029+01:00 | QbftBesuControllerBuilder | Produced       block #14 / 1 tx / 0 pending
11-05-2024 11:08:59:::18 	Block:  15  |  2024-05-11 12:09:06.062+01:00 | QbftBesuControllerBuilder | Produced       block #15 / 1 tx / 0 pending /
11-05-2024 11:09:24:::36 	Block:  16  |  2024-05-11 12:09:31.075+01:00 | QbftBesuControllerBuilder | Produced       block #16 / 1 tx / 0 pending
                                            |  2024-05-11 12:10:01.088+01:00 | QbftBesuControllerBuilder | Produced empty block #17 / 0 tx / 0 pending
11-05-2024 11:10:07:::26 	Block:  18  |  2024-05-11 12:10:11.043+01:00 | QbftBesuControllerBuilder | Produced       block #18 / 1 tx / 0 pending
                                            |  2024-05-11 12:10:41.085+01:00 | QbftBesuControllerBuilder | Produced empty block #19 / 0 tx / 0 pending
                                            |  2024-05-11 12:11:11.106+01:00 | QbftBesuControllerBuilder | Produced empty block #20 / 0 tx / 0 pending

On the left is the timestamp of the Txs being sent and the number after :::: is the number of seconds that it waited until it sends the next. The Block: X is the number of the block returned by the deploy script.

The numbers seem right, every time the wait period is > 30 a new empty block is created, otherwise a non-empty block is created in the next 5 seconds frame after being sent (well, given or taken because I'm not counting the time of the Txs being created).

The script could be better written but... it is what is!!! I may write it again to get more accurate times...

for (( i=0; i<10; ++i)); do
    nwait=$((RANDOM % 60))
    echo $(date -u '+%d-%m-%Y %H:%M:%S'):::$nwait
    npx hardhat run scripts/deploy.js --network localhost
    sleep $((nwait))
done

@amsmota
Copy link
Author

amsmota commented May 12, 2024

And a few more tests, this time with a Transition at Block: 10. It seems all OK to me... Next step, tweak a few logs, clean the code and commit and push.

=== 5/10 ===
2024-05-12 10:56:47.407+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:56:47.493+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending /
2024-05-12 10:56:52.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:56:57.041+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:56:57.056+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending /
2024-05-12 10:57:02.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:57:07.073+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3 / 1 tx / 0 pending /
2024-05-12 10:57:12.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:57:17.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:57:17.044+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #4 / 0 tx / 0 pending /
2024-05-12 10:57:22.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:57:27.033+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:57:27.049+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #5 / 0 tx / 0 pending /
2024-05-12 10:57:32.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:57:37.020+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:57:37.035+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #6 / 0 tx / 0 pending /
2024-05-12 10:57:42.031+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7 / 1 tx / 0 pending /
2024-05-12 10:57:47.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:57:52.025+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:57:52.041+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #8 / 0 tx / 0 pending /
2024-05-12 10:57:57.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:58:02.045+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #9 / 1 tx / 0 pending /
2024-05-12 10:58:12.013+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
=== 10/20 ===
2024-05-12 10:58:22.043+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #10 / 1 tx / 0 pending /
2024-05-12 10:58:32.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:58:42.025+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:58:42.040+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #11 / 0 tx / 0 pending /
2024-05-12 10:58:52.029+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #12 / 1 tx / 0 pending /
2024-05-12 10:59:02.005+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:59:12.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:59:12.033+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #13 / 0 tx / 0 pending /
2024-05-12 10:59:22.036+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #14 / 1 tx / 0 pending /
2024-05-12 10:59:32.014+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 10:59:42.022+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 10:59:42.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #15 / 0 tx / 0 pending /
2024-05-12 10:59:52.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #16 / 1 tx / 0 pending /
2024-05-12 11:00:02.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 11:00:12.029+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #17 / 1 tx / 0 pending /
2024-05-12 11:00:22.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 11:00:32.048+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #18 / 1 tx / 0 pending /
2024-05-12 11:00:42.019+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #19 / 1 tx / 0 pending /
2024-05-12 11:00:52.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #20 / 1 tx / 0 pending /
2024-05-12 11:01:02.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 11:01:12.043+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #21 / 1 tx / 0 pending /
2024-05-12 11:01:22.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block NOT expired
2024-05-12 11:01:32.026+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | Empty Block expired
2024-05-12 11:01:32.041+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #22 / 0 tx / 0 pending /

I left the BlockTimer log there so it is visible that it appears every blockperiodseconds (5 or 10) until it gets to the emptyblockperiodseconds when it creates the empty block.

This are the results from the script:

>>>>> 12-05-2024 09:57:03:::29
Block:  3
<<<<< 12-05-2024 09:57:07:::29

>>>>> 12-05-2024 09:57:36:::18
Block:  7
<<<<< 12-05-2024 09:57:42:::18

>>>>> 12-05-2024 09:58:00:::9
Block:  9
<<<<< 12-05-2024 09:58:02:::9

>>>>> 12-05-2024 09:58:11:::22
Block:  10
<<<<< 12-05-2024 09:58:22:::22

>>>>> 12-05-2024 09:58:44:::21
Block:  12
<<<<< 12-05-2024 09:58:52:::21

>>>>> 12-05-2024 09:59:13:::24
Block:  14
<<<<< 12-05-2024 09:59:22:::24

>>>>> 12-05-2024 09:59:46:::8
Block:  16
<<<<< 12-05-2024 09:59:52:::8

>>>>> 12-05-2024 10:00:00:::18
Block:  17
<<<<< 12-05-2024 10:00:12:::18

>>>>> 12-05-2024 10:00:30:::1
Block:  18
<<<<< 12-05-2024 10:00:32:::1

>>>>> 12-05-2024 10:00:33:::3
Block:  19
<<<<< 12-05-2024 10:00:42:::3

>>>>> 12-05-2024 10:00:45:::15
Block:  20
<<<<< 12-05-2024 10:00:52:::15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants