Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

el_manager initial refactor. #6228

Merged
merged 7 commits into from May 14, 2024
Merged

el_manager initial refactor. #6228

merged 7 commits into from May 14, 2024

Conversation

cheatfate
Copy link
Contributor

Goals

  1. Eliminate all the usages of "helpers" with proper primitives.
  2. Add more error handlers and more error reporting (mostly on DEBUG level).
  3. Adopt asyncraises usage.

Copy link

github-actions bot commented Apr 22, 2024

Unit Test Results

         9 files  ±0    1 319 suites  ±0   25m 27s ⏱️ - 8m 12s
  4 982 tests ±0    4 634 ✔️ ±0  348 💤 ±0  0 ±0 
20 802 runs  ±0  20 398 ✔️ ±0  404 💤 ±0  0 ±0 

Results for commit 6def75b. ± Comparison against base commit 8ca537c.

♻️ This comment has been updated with latest results.

beacon_chain/el/el_manager.nim Show resolved Hide resolved
beacon_chain/el/el_manager.nim Show resolved Hide resolved
asyncSpawn connection.close()
connection.web3 = none[Web3]()
of Degraded:
await connection.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this increase the delay until a fallback client is used? close may run for 30s, and with await no progress is done during that time.

The new version is at least cleaner in tracking what is going on, the old version possibly run into situation where multiple close processes were running at same time I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every possible async task could run for 30s, that's not how we should protect the code from running for 30s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Procedures should not spawn procedure with close and do not wait for it actually being closed. As we seen this many times before its not a good practice which leads to, leaks and UB (when just closed transport being reused by OS and you will have 2 transports with same FD in process, one is closing and another was just opened).

@cheatfate cheatfate force-pushed the el-refactoring branch 2 times, most recently from 80229c5 to 3baa56b Compare April 26, 2024 11:52

# TODO can't be defined within exchangeConfigWithSingleEL
func `==`(x, y: Quantity): bool {.borrow.}

proc exchangeConfigWithSingleEL(m: ELManager, connection: ELConnection) {.async.} =
proc exchangeConfigWithSingleEL(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a misnomer now -- it hasn't been really exchangeConfigWithSingleEL since #5585 and #5889

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your name proposal for it? It still performs network_id check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, so checkNetworkIdWithSingleEL, say, or checkChainIdWithSingleEL

@@ -1763,18 +1949,27 @@ func hasProperlyConfiguredConnection*(m: ELManager): bool =

false

proc startExchangeTransitionConfigurationLoop(m: ELManager) {.async.} =
proc startExchangeTransitionConfigurationLoop(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a misnomer since #5585 and #5889 along with the debug log message, etc.

It checks chain ID now, does not exchange transition configuration

pending.add(m.chainSyncingLoopFut.cancelAndWait())
if not(m.exchangeTransitionConfigurationLoopFut.isNil()) and
not(m.exchangeTransitionConfigurationLoopFut.finished()):
pending.add(m.exchangeTransitionConfigurationLoopFut.cancelAndWait())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the waits here delay clean database closing in case of stuck connections, or at least allow the database to close cleanly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This waits are proper cancellation, you should wait until loops will not be cancelled.

Copy link
Contributor

@tersec tersec Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes. My question is whether in the case where it takes arbitrarily long to finish the loops, whether the database cleanup still happens first. If it doesn't, then it places the user in a less than great position. Ultimately the Chronos state is ephemeral and the state which has to remain intact for the next run is in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to close database first - do it first, if you want to work properly in async world, you should signal your workers and wait them to complete their own cleanup processes. Overall construction of exit procedure in BN is incorrect, because it does not allow tasks to perform any cleanup procedures. You working with database in non-async way, if you think that you first task is close database - do it before signaling async tasks to finish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, and out of scope of this PR.

@tersec tersec merged commit e6b9bfc into unstable May 14, 2024
14 checks passed
@tersec tersec deleted the el-refactoring branch May 14, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants