Skip to content

Lodestar Planning & Standup Meetings

Phil Ngo edited this page Apr 30, 2024 · 38 revisions

The Lodestar team hosts planning/standup meetings weekly on Tuesdays at 2:00pm Universal Standard Time. These meetings allow the team to conduct release planning, prioritise tasks, sync on current issues, implementations and provide status updates on the Lodestar roadmap.

Note that these notes are transcribed and summarized by AI language models and may not accurately reflect the context discussed.


April 23, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6678

Planning and Discussions

v1.18 Concerns and libp2p TCP Upgrade

  • Primary concern discussed was the beacon attestation performance issue related to the subscribe on subnet, which appears to be a consequence of upgrading libp2p TCP.
  • Currently deployed a reverted version on feature one for comparison against the Release Candidate (RC) running on CIP validators.
  • Need more data to make a decision as the changes have only been live for 12 hours.
  • Matthew has deployed nodes with and without the upgraded libp2p TCP to compare performance. Noted differences in performance indicating potential issues caused by the upgrade.
  • Concerns about other libraries updated in the yarn.lock file which might also affect performance.

100 Peer Count Discussion

  • Discussion on whether to revert the TCP fix and the increase to a 100 peer count due to a memory leak issue not being resolved by the updated libP2P TCP.
  • Despite the memory leak, the increased peer count appears to improve the inclusion of beacon aggregates and general network performance.
  • A proposal to keep the 100 peer count based on better performance metrics and feedback from validators, although there's an ongoing concern about memory leaks potentially causing node crashes.

Rocketpool Validators

  • We are now running Rocketpool validators for the team, which have been performing well.

Grant Applications

  • Lodestar decided to apply for the Libp2p RetroPGF Grant
  • Lodestar is also participating in Gitcoin Grants Round 20 featured in the Infrastructure category.

Pectra Devnet-0 Updates:

EIP-7549 Implementation:

  • Status: Approximately halfway completed.
  • Details:
    • Beacon node implementations are complete.
    • P2P/Gossip implementations are nearly complete.
    • Validator-related implementations are pending, expected to take an additional 3-4 days before starting specification tests.

EIP-6110:

  • Status: Implementation is complete and included in DevNet Zero.
  • Details:
    • Future changes may involve the reuse of the MaxEB deposit queue for processing deposits, but these are not part of the initial DevNet testing.

EIP-7002:

  • Potential Changes: Inclusion of partial withdrawals by MaxEB.
  • Details:
    • Implementing partial withdrawals associated with MaxEB is considered straightforward and can be added quickly depending on the scope decided for MaxEB in the DevNet.

EIP-7251 MaxEB Implementation:

  • Waiting for a new release of SSZ (Simple Serialize) needed for MaxEB implementations.
  • Spec tests to start following minor code adjustments post-SSZ update.

Devnet-0 Readiness:

  • Lodestar is nearing readiness to integrate with an Execution Layer (EL) client that has implemented all necessary features, although it's unclear which EL client (potentially Geth or Ethereum.js) is also DevNet ready.

Updates

Matt:

Review and Revision Process

  • Extensive Review: Matt has undergone 17 rounds of reviews from Nico, appreciating the thoroughness provided.
  • Final Touches: The discussions are down to final considerations, such as whether to throw an error or use a console warning in the apply process discussed earlier.
  • Detailed Collaboration: A significant review session was conducted with Cam, including a couple of hours on the phone, reviewing this and other repositories to ensure all aspects were covered.
  • Completion: With the last comment now addressed on GitHub, the PR is ready to be approved and merged.

NC:

  • EIP-7549: NC's primary focus has been on understanding and implementing this EIP.
  • Attestation Mechanics: He is delving into the current caching mechanisms used for attestations, including the attestation seen cache and attestation data encoding (base 64).
  • Code Familiarization: NC needs to spend more time understanding the public code base to continue effectively implementing the EIP.
  • Consensus Spec Fixes: While implementing EIP-7549, NC has also made some minor fixes to the consensus specifications.

Tuyen:

  • Historical State Issues:
    • Initial Problem: Encountered an issue with n-historical state when configuring zero checkpoint state and one single block state, which led to difficulties in reaching a head state due to assumptions in the reload process.
    • Resolution: The issue was resolved, and further improvements were made to the state caching mechanisms.
    • New Caching Strategy: Renamed and restructured caches into BlockState cache and CheckpointState cache to enhance clarity and efficiency.
  • Persisted State Metrics: Implemented a new metric related to persisted states. However, Tuyen expressed a preference to avoid testing this n-historical state in CIP nodes for the upcoming release, opting instead for a two-week stability period without issues before proceeding.
  • Single Instruction, Multiple Data: The SIMD PR is ready and awaits review. Tuyen had discussions with Gajinder regarding the naming within the PR, signaling it's prepared for evaluation by Cayman or Gajinder.
  • Merkle Tree Hash Optimization:
    • Inspiration: Tuyen was inspired by a presentation at DevCon about the optimal methods for hashing Merkle trees, particularly the advantages of batch hashing for flat array structures.
    • Current Approach: Previously, Tuyen considered a different approach involving grouping hash computations by tree level from the root to the left node.
    • New Potential Approach: After learning about batch hashing, Tuyen is considering grouping hash computations by level before committing, which could enhance performance and efficiency but requires closer examination due to its complexity.

Nico:

  • v1.18 Finalization: Nico has been finalizing the necessary components for the v1.18 release.
  • REST API Performance Investigation: Noted outliers in performance metrics for some REST API calls, indicating a need for further data and analysis to identify the causes of delayed response times.
  • Data Conversion: Identified an issue related to how buffers are converted in multiple areas, leading to opening a review issue to address this properly.
  • Nimbus Compatibility: Most compatibility issues have been resolved. However, there remains a problem with Nimbus, partly due to their side and partly because Lodestar does not support SSZ for the request body. More debugging is needed to resolve these issues fully.
  • SSZ Refactor:
    • Nico plans to pick up the SSZ refactor branch, aiming to prepare it for a pull request and ensure it's build-ready.

Gajinder:

PeerDAS Development

  • Generalization of Block Input:
    • Identified a need to make block input handling more generic and fork-aware to accommodate the requirements of PeerDAS and potential future needs in ePBS/IL.
    • Implemented changes through a series of pull requests:
      • The first PR includes the addition of blob metrics and some cleanup.
      • A subsequent PR generalizes the block data to be fork-aware.
  • Gajinder plans to continue his development work on PeerDAS.
  • Electra Readiness:
    • Alongside ongoing projects, Gajinder will also focus on ensuring that systems and implementations are ready for the Electra upgrade.

Cayman:

  • BLS PR Collaboration:
    • Worked closely with Matt on the BLS PR, conducting an in-depth review and collaboration session.
  • Spent time catching up on notifications and conducting a number of smaller code reviews throughout the week.
  • Yamux Branch:
    • Re-pushed the yamux branch to feature 3, expecting to gather relevant data soon.
    • Highlighted an urgency due to potential deprecation of Mplex, stressing the need for expedited resolution and implementation.
  • Electra EIPs Engagementn:
    • Started to delve deeper into the Electra EIPs to better understand the upcoming changes and enhancements.
    • Expressed a desire to become more actively involved in specification discussions and PRs related to Electra.
    • Aims to be well-prepared and hands-on for upcoming interop events, knowing the importance of being closely familiar with the code.
  • Dependabot PRs:
    • Managed to merge several dependabot PRs after resolving a permissions issue related to code coverage token, which is managed separately for dependabot in the repository.

April 9, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6653

Planning and Discussions

  • Discussion on the progress of v1.18, nearing completion, awaiting final updates.

BLST Integration Update

  • Matt/Tuyen updated on the BLST integration:
    • Workflow issues resolved with recent commits addressing bugs in the publishing flow.
    • A small PR is up for approval to finalize testing code integration.
    • Target to merge and move to beta testing phase soon.

API Updates

  • Finalized Property to API Responses:
    • Nico has implemented suggested fixes and is awaiting further review (from Tuyen or NC).
  • POST Methods for State Validators and Balances:
    • Discussion on implementing POST methods to address Teku incompatibility issues.
    • This includes a fallback mechanism for 404 errors which Teku handles incorrectly with the current setup.
    • Possible integration of these methods pending review of Beacon API PR 440 and consensus on implementation.

Release Planning

  • The team discussed pushing the current build to beta as an RC1 soon, even if some features (like POST methods) are added later.
  • Considerations for including the builder boost factor in the release:
    • The current setup will proceed with default boost factor settings unless explicitly configured otherwise.
    • A warning might be added for users setting the boost factor flag without using the max profit setting, as it will be overridden.

Heap Memory Snapshots Issue

  • Nico was thanked for obtaining the heap snapshot, which is critical for diagnosing memory issues.

  • Tuyen suggested the need for specific infrastructure to effectively manage heap dumps

  • The team is using Hetzner AX-11 servers, which are currently unable to handle the heap dump requirements.

  • Discussion centered on the errors encountered during the heap dump process:

    • The main issue identified was related to memory requirements when writing the heap dump to the file system, leading to a system crash due to out-of-memory errors.
    • The process requires doubling the memory temporarily, which was observed in metrics.
  • There was confusion about the term "swap" memory, clarified as a method to extend memory capacity using disk space, which is not currently configured on the servers.

  • Proposed Solutions

    • Suggestions to adjust server configuration to include swap memory or to utilize streaming the heap dump to disk to avoid high memory consumption all at once.
    • The feasibility of using API functions that allow heap dumps to be written as streams rather than a single large block was discussed.
    • Matt will review the code to identify potential changes in the function calls used for generating heap dumps.
    • Plans to collaborate with Faith to explore necessary infrastructure adjustments to facilitate ongoing heap capture without system overloads.
  • The Ethereum Foundation (EF) has released a specification for Pectra Devnet zero, which includes several EIPs.

  • Gajinder and NC are focusing on implementing EIP-7002 and Max EB respectively, as part of the preparations for an upcoming interop event.

Pectra-Devnet-0

  • Gajinder's Progress:

    • EIP-7002 is mostly completed; remaining tasks include testing with an execution layer that supports EIP-7002.
    • Current work involves fixing types and resolving any failing tests.
    • The implementation is expected to be pushed for further progress within the week.
  • NC asked if the current submission of EIP-7002 is ready for review, to which Gajinder confirmed its readiness and highlighted the simplicity of the required changes, mainly processing exits.

  • Gajinder suggested that if NC handles the attestation updates, he could take on the PeerDAS implementation.

  • Inclusion Lists: Technical Challenges and Solutions

  • Reorganizations (Reorgs) and IL Construction: The necessity of constructing an IL during a blockchain reorganization (reorg) was highlighted if one is not already available. This is particularly significant when proposing new blocks after a reorg.

  • ILs and Blobs: Treating ILs similarly to blobs (block-like objects) could simplify their synchronization and maintenance, ensuring they are readily available as needed.

  • EIP-3074 Complications: Gajinder addressed complexities related to EIP-3074, which includes handling transactions that could drain wallets and those that are interdependent. These complexities affect the dynamics of transaction processing and the potential for balance-draining transactions that alter the conditions under which subsequent transactions are processed.

  • Synchronization of ILs: By always synchronizing ILs, the need to reconstruct them is eliminated, simplifying the handling of reorgs.

  • Validity Conditions Update: Validity checks for new blocks with respect to ILs have been adjusted to consider scenarios where balances might be drained by EIP-3074 compliant transactions. This update reflects changes in transaction eligibility based on account balances at different block heights.

  • POC Development: Gajinder has developed a POC that does not treat ILs like blobs, but he noted that this approach could be adapted with minimal adjustments. He also mentioned the potential for restarting development based on the settled design of ILs.

  • Focus Shift: If Electra development stabilizes without further requirements for ILs, Gajinder plans to concentrate on enhancing the IL implementation and possibly extending the POC.

MaxEB Discussion

  • NC and another team member participated in a call about MaxEB, deciding to continue with the current specification despite ongoing discussions about potential modifications.

  • Stakeholder Concerns: Lido has raised issues regarding the inability to control validator consolidations due to a misconception that all validators with the same withdrawal credentials are fungible.

  • Lido's need for more control over consolidations has led to a proposal for execution-initiated consolidations, which is seen as adding complexity to the system.

  • Complexity Debate: There's a disagreement on the complexity introduced by execution-initiated consolidations. Some believe it adds significant complexity, while others, including Gajinder, see it as manageable and similar to existing mechanisms.

  • MaxEB Code Updates: The MaxEB specification has been rapidly changing, requiring continual updates to the codebase to stay aligned with the latest consensus specifications.

  • Current implementation includes consolidation features, which may be removed depending on future spec changes.

EIP-6110 Considerations

  • Potential Integration: Discussion on integrating EIP-6110 to utilize a pending deposit queue in the beacon state, which could simplify handling of new validators and reduce the need for an unfinalized public key cache.

  • This approach would delay the processing of deposits but is seen as potentially reducing code complexity.

  • Attestation Data and Tracking:

    • The only other major pending update is related to attestation data.
    • A tracking system for changes, particularly around epoch calculations, is discussed to ensure the code remains up-to-date with spec modifications.

Increasing Default Peer Count to 100

  • The team discussed the final decision on peer counts, specifically whether to increase the default setting to 100 peers. This follows from experiments some team members conducted but had not yet been tried on the CIP validators.
  • Performance Improvements: Individuals like Nico and another team member have manually increased their peer counts to 100, observing notable improvements in network performance. For instance, beaconcha.in showed an increase in effectiveness from around 96-97% to 99%.
  • Memory Concerns: Increasing peers to 100 has been associated with a slow memory leak, with memory usage incrementally rising by about 3-4GB over a month. However, the system has not crashed due to memory overflow, as the usage seems to cap (e.g., max observed at 10GB).

Considerations

  • Risk vs. Benefit: The main discussion point was whether the improvement in data availability and network performance justifies the potential memory leak risk.
    • It was noted that memory issues were also present at lower peer counts (e.g., 50 peers), and manual restarts or container reboots every 10-15 days have been effective mitigations.
  • Community Practices: Evidence suggests that many users are already manually setting their peer count to 100, indicating a common practice and preference within the community.
  • Compatibility and Testing:
    • The team has not yet tested the 100-peer setting with the new BLST on memory in the network thread.
    • Tuyen noted the increased seen_ttl setting's impact on heap memory already
  • Version Updates: Concerns were raised about users who do not frequently update their software, potentially facing issues if they run older versions with higher default peer counts.

Decision

  • Given the benefits and common user practices, the consensus was to increase the default peer count to 100 in the upcoming v1.18 release.
  • The team agreed to continue investigating and addressing the underlying memory issues in parallel with this change.

April 2, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6615

Planning and Discussions

Agenda and Release Planning

  • The team is preparing for the v1.18 release and is assessing which features can be included as progress has been slower than expected.
  • A release candidate is targeted for the end of the week, with hopes to push the update next week, marking three weeks since the last release.

Peer Count Increase

  • There was a discussion about increasing the default peer count despite a known memory leak. The possibility of making this change optional via a flag was discussed, allowing users to increase the peer count at their discretion.
  • Concerns about memory impact persist, and further investigation into heap snapshots is needed to better understand the issue.
  • Agreed to test this on a smaller subset of mainnet nodes to see performance impacts, such as our CIP fleet.

Feature Flags and Testing

  • The team considered allowing users to control settings such as peer count increases through the UI of platforms like DappNode.
  • We will defer our default peers increase to 100 until we fix the memory leak and have better metrics on the impact.
  • The feasibility of implementing feature flags for testing and gradual deployment of new features was discussed.

Memory and Performance Issues

  • Tuyen expressed concerns regarding memory usage and has requested heap snapshots from DevOps to diagnose issues.
  • Discussions covered the logistics of obtaining these snapshots, including access and permissions, to ensure they can be analyzed effectively.

Metrics and Testing

  • Ongoing testing and review of new features' impact on system performance and stability are crucial, especially concerning memory usage and peer connections.
  • The team is cautious about implementing changes that could lead to significant issues, particularly in areas like memory management, and seeks to gather more data before proceeding with certain updates.

Gossip Sub Batch Publish Review

  • The Gossip Sub Batch Publish feature needs a final review to be included in the upcoming v1.18 release. It is nearly ready and just requires final checks.

Remote Signer and Token File Configuration

  • Nico's Update: The Remote Signer feature is almost ready, pending a final review.
  • Token File as an Alias: There was a finalized discussion about using a token file as an alias to simplify key manager configuration for DevOps, aligning with specifications.

Invalid Signatures and Block Production

  • Tuyen's PR: A pull request is ready that addresses an issue with invalid signatures by ensuring correct head data when producing blocks. This PR is crucial for maintaining block production integrity and is straightforward.

Proposer Boost Reorganization

  • Clarification that the proposer boost reorganization is not blocked by other PRs and is not a top priority right now.
  • It is considered ready for merging as it does not interfere with current operations or fork choice performance and acts as standalone code.

Additional PRs and Cleanup

  • Several PRs are open, including one for n-historical state issues related to end-to-end testing, which is still in draft but aimed to be finalized soon.
  • A quick review of open PRs, including dependabot updates, is needed to clean up the repository before the next release.

Release Planning

  • The team plans to push a release candidate for v1.18 by the end of the week, aiming to include all reviewed and ready features.
  • A cautious approach is being taken regarding increasing default peer counts due to unresolved memory leak concerns, opting instead to allow users to adjust this setting manually.

Builder Boost Factor Discussion

  • There was general consensus from a Twitter thread that setting healthy defaults for the builder boost factor is favorable. This allows clients some control over the settings, even if these are not entirely neutral.
  • Some participants felt that setting a 90% boost factor is ineffectual and does not substantively alter the current dynamics, suggesting it might be redundant to implement.
  • Lighthouse, another client, allows users to set their builder boost factor, which could be a preferable approach as it empowers users to determine their settings rather than having a preset default.
  • It was suggested to rename the boost factor setting to something like "default" or use another alias that indicates setting the boost factor at 90% as the default. This renaming would clarify the purpose and expectation of the setting.
  • The discussion highlighted the technical challenges of using a relative percentage value for the boost factor. It was noted that such a setting impacts high-value and low-value blocks equally, which might not be optimal. The suggestion was to consider using a minimum bid (min-bid) setting or a maximum delta flag as additional parameters to refine the decision-making process for block selection.
  • The boost factor is seen as a tool to favor local blocks when there is no significant maximal extractable value (MEV) to be gained from builder blocks. This approach supports local block production when it aligns closely in value with builder blocks, promoting fairness and reducing potential censorship.
  • A detailed analysis of block values during high and low network traffic times was suggested to better inform the setting of the boost factor. This would help establish a more nuanced approach that could dynamically adjust based on network conditions.

Docs Versioning Update (PR 6559)

  • Current Status: The migration PR for the documentation versioning has been merged.

  • Next Steps: The Docusaurus implementation will be published in the next release, enabling version-specific documentation features.

  • Technical Detail: The PR includes an empty array setup in a JSON file, which is a preliminary step. This setup allows Docusaurus to start displaying a versions feature on the documentation website. Future updates can populate this array to specify which versions are supported.

  • PR 6528: This PR involves renaming cache and related functions to better indicate their purposes post-finalization. It is part of a series of follow-ups planned after the initial implementation of EIP-6110.

  • Branch Concern: The PR is aimed to merge into the Electra fork branch, not the unstable branch, which may require adjustments or specific reviews.

Electra Fork Branch Maintenance

  • The branch is currently named electra-fork. There was a suggestion to rename it to feature/electra-fork for better visibility, but it was noted that this branch is specifically a fork branch, not just a feature branch.
  • The Electra fork branch is being maintained separately from the main unstable branch due to the ongoing changes and uncertainties in the specifications and EIPs related to the Electra upgrade.
  • Longevity of the Branch: There is concern about the branch being long-running, which could complicate maintaining parity with the unstable branch due to the need for frequent updates and rebases.
  • The integration of the Electra fork into the main unstable branch is delayed until the specifications are more stable and finalized. This approach avoids the complexities and potential errors that could arise from premature integration.
  • Lead Maintainer: Gajinder is noted as the primary maintainer of the Electra fork branch, ensuring it stays updated.
  • Merging Strategy: The current plan is to avoid squash merging the Electra fork into unstable. Instead, a rebase strategy will be employed to maintain a clear commit history. This method helps in keeping individual contributions visible and simplifies the management of changes.
  • Conflict Management: Regular rebasing is performed to minimize conflicts. When conflicts do occur, they are managed on a case-by-case basis to maintain a clean branch that can be merged into unstable when appropriate.
  • Clean History: Rebasing is preferred because it keeps the git log clean and straightforward, which is beneficial for reviewing historical changes and conducting diffs.
  • Ease of Cherry-Picking: A well-maintained rebase flow makes it easier to cherry-pick changes as needed without the clutter of unrelated modifications.
  • Visibility of Contributions: By not squash merging, all developers' commits remain visible in the branch’s history, acknowledging their contributions.
  • Ease of Integration: A clean and regularly updated branch through rebasing allows for smoother eventual integration into the main unstable branch.
  • Previous Merges: The approach taken with the Electra fork is similar to past practices, such as the integration of significant features well before mainnet forks to ensure stability and thorough testing.
  • The maintenance of the Electra fork branch is a strategic choice to cope with the fluid nature of upcoming network upgrades. By keeping this branch separate and employing a careful rebase strategy, the team ensures that the main unstable branch remains stable and that the Electra upgrades can be integrated smoothly once specifications are finalized.

Cross-Client Compatibility Issues

  • Cross-Client Testing: The Ethereum Foundation (EF) DevOps team is emphasizing cross-client testing to identify compatibility issues among different beacon nodes and validator clients. This testing aims to uncover discrepancies in protocol implementations or misinterpretations of specifications that could hinder interoperability with clients like Lighthouse or Teku.
  • Vouch Compatibility: There are intermittent issues with Vouch, particularly with aggregates. It's suspected that Vouch's issues stem from not consistently interacting with the same beacon node for attestation and aggregate requests, leading to cache misses and errors.
  • Priority of Compliance: Identifying and resolving deviations from the protocol specifications is considered a high priority. An example given was a misinterpretation of a query parameter default value, which was corrected to align with the spec.

SSZ Beacon API Support

  • Current State: There are ongoing efforts to address issues with the SSZ beacon API, particularly around supporting SSZ for V2 blocks. The complexity of these issues has led to a suggestion to temporarily remove SSZ support from certain APIs to expedite other updates.
  • Implementation Strategy: It's suggested that SSZ support could be simplified by limiting it to essential APIs, reducing the immediate workload and focusing on stabilizing the core functionalities.

Obol and Diva Staking Compatibility

  • Testing Infrastructure: Plans are being made to integrate consistent testing for Obol compatibility and to include Diva Staking in internal tests. This is to ensure Lodestar maintains compatibility with these implementations and to quickly identify any integration issues.
  • Diva Staking Setup: There are challenges in integrating Diva Staking into continuous integration (CI) systems due to its closed-source nature and the lack of a simple setup for development environments. It's mentioned that Diva might provide a simpler setup in future releases which could be integrated into CI.

Updates

Tuyen:

  • Gossipsub Improvement: Implemented a PR to improve publish delays by ensuring messages are published to at least a certain number of mesh peers, inspired by Lighthouse.
  • n-Historical State: Addressing a bug in the n-historical state feature with a PR still in draft, planning to finalize the entry and conduct tests.
  • Shuffling Optimization: Collaborating with Matthew to explore offloading the shuffling process to a worker thread, a task more complex than initially thought, requiring further brainstorming.
  • Project Planning: The shuffling optimization work has been moved to a future version target to allow for thorough development and testing.

Nazar:

  • EIP 4844 Testing: Opened a PR to integrate EIP 4844 tests into the SIM environment, moving away from separate EIP interop tests.
  • Web3.js Support: Added support for a new transaction type in web3.js that accommodates blob transactions, enhancing testing capabilities in SIM environments.
  • Builder PR Assistance: Requested help for a long-standing PR related to the builder, planning to consult with Gajinder to resolve issues regarding the flashboard builder's performance.
  • Light Client and Prover Packages: Plans to open an issue to manage multi-version packages for the lite client and the prover in both package managers and browsers, aiming to start this task within the week.

NC:

  • MaxEB Completion: Focused on completing the final 10% of the MaxEB implementation.
  • SSZ Repository Update: Plans to add a sliceFrom function to the least composite preview in the SSZ repository, noting that the implementation will differ significantly from the existing sliceTo function and will require dedicated time to develop.

Julien:

  • Docusaurus migration is merged, waiting for next release to be deployed; experimented with light-client usage in Docusaurus, some complexity left
  • Ddded missing headers for publish block requests
  • Some DX improvements
  • Started looking at unexpected high latency for some requests

March 19, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6553

Planning and Discussions

BLST Updates

  • BLST Rebuild Branch: Close to completion, awaiting another round of review. Despite the large diff, much of it has been reviewed by Gajinder. The integration of this branch is seen as a catalyst for the next release.

n-Historical States and Target Peers

  • n-Historical States: There's interest in merging the n-historical states work behind a feature flag for v1.18.
  • Target Peers Increase: The default target peers are likely to be increased to 100 based on preliminary reviews showing no significant increase in heap size, just more garbage collection in the network thread.

Proposal Boost and Testing

  • Proposal Boost: Deployed on a feat2 group for testing, with coding and PR comments addressed. Further testing planned to ensure block production with n-2 parent root and to potentially write an end-to-end test for late block reorg by Tuyen.

Optional pushes for v1.18+

  • PR 6033 Historical State Regen: Cayman plans to address comments and prepare the PR for a more mergeable state before his leave. The inclusion of this feature in v1.18 or v1.19 is up for discussion, seen as a nice-to-have rather than critical.
  • Potentially look at including Matt's async shuffling refactor.

Issues with SSZ Releases

  • With Cayman being away, Phil now has access to manually publish releases for SSZ incase of CI failure.

Updates

Cayman:

Merkle Tree NAPI-RS Experiment

  • The experiment aimed to migrate all persistent Merkle tree code into Rust, utilizing the napi-RS ecosystem. This involved creating a Rust implementation of the persistent Merkle tree and wrapping it with a small NAPI layer for JavaScript interaction.
  • The work was done in a branch named cayman/napi-merkle-node within the SSZ repository: https://github.com/ChainSafe/ssz/tree/cayman/napi-merkle-node
  • Unfortunately, the experiment did not yield the expected improvements. The Rust implementation resulted in significant slowdowns, with the beacon node struggling to sync and keep up with the chain due to the reduced performance.
  • A suspected cause of the slowdown is the overhead of allocating temporary pointers to nodes, which negates the memory savings from pre-allocating nodes. This creates a dilemma where the goal of saving memory leads to decreased performance, without an apparent solution to balance both aspects efficiently.
  • Cayman plans to document the experiment's details and findings in an issue for future reference. This will allow the team or interested individuals to revisit the experiment and potentially explore alternative approaches.
  • The experiment is currently on hold, with the branch available for review. The team may consider revisiting this approach in the future if new insights or strategies emerge to address the identified challenges.

Tuyen:

n-Historical State Feature Flags:

  • Tuyen has been working on one of the final tasks for the n-historical state, which involves creating feature flags to utilize new state caches. This work is progressing well.

SSZ Serialization Improvements:

  • A PR that changes the type of balances for faster serialization has been merged.
  • Another PR aims to change the type of validators to speed up serialization further. Incorporating these changes has shown promising results:
  • Serialization of the Holesky state takes around 350ms, and less than 300ms for the Mainnet state, which is considered efficient enough to perform per epoch.
  • This serialization typically occurs during the last third of the first slot of each epoch, ensuring it does not interfere with critical processing times.

Memory Usage Improvement:

  • The n-historical state implementation appears to reduce the heap memory usage by approximately 1GB, indicating a significant optimization.

Proposal Boost PR Review:

  • Tuyen has also reviewed the proposal boost PR code, contributing to its refinement.

Gossipsub Metric Fix:

  • A PR has been submitted to address a broken metric in gossipsub, aiming to improve the accuracy and reliability of network metrics.

Gajinder:

  • Current Focus: Gajinder is deeply involved in the inclusion lists (IL) proof of concept (PoC) work, aiming for initial integration this week. The work is based on a primitive but workable spec.

  • Spec Status: Most of the design for IL has been finalized, with ongoing iterations on specific details, such as whether certain elements within the execution header need to be signed. The consensus is leaning towards requiring signatures to validate the execution payload's summary at the beacon layer.

  • Design Changes:

    • The design has shifted from a bundled approach for transmitting the inclusion list P2P to an unbundled approach, as advocated by Potuz.
    • Recent modifications include removing the need for the execution layer to keep parent spenders for validating the parent inclusion list summary. This change simplifies the execution layer's requirements.
  • Inclusion List Sync Mechanism:

    • The mechanism for syncing inclusion lists differs from that of blobs. Blobs are essential for block import, whereas inclusion lists are not required unless operating at the head of the chain.
    • During sync, inclusion lists are not needed for beacon blocks by range, as the child block should satisfy the parent's inclusion list. However, inclusion list availability is crucial on gossip when importing a block, as blocks cannot be attested to or built upon without available inclusion lists.
  • Fork Choice Extension:

    • Gajinder is extending the fork choice with an additional flag to indicate the inclusion list status, which will help manage the new scenarios introduced by IL.
    • The fork choice will confirm the validity of a block and its ancestors' inclusion lists. However, the availability of an inclusion list for a block validated by the execution layer must be independently verified.
  • Handling Invalid Child Blocks:

    • A scenario where a chain has an invalid child block could occur if the execution layer initially indicates syncing status before later deeming the block invalid.

NC:

  • Plans to draft the engine API for the Electra PoC. A PR is expected soon.

  • Reviewing Technical Writing: NC is reviewing detailed educational material by Emmanuel, a technical writer active in the ETH R&D community. Emmanuel's work includes in-depth coverage of EIPs, including Max EB and inclusion lists.

  • MaxEB Spec Review: NC reviewed Lion's MaxEB spec, noting the absence of slashing penalty calculations, suggesting it's not finalized. Despite this, the overall spec appears solid, and NC has started coding a PoC for MaxEB.

  • Focus for Coming Week: NC plans to focus on testing the proposer boost with the aim of merging it soon.

  • Slashing Penalty Concerns: The discussion highlighted concerns from large node operators regarding the slashing penalty under MaxEB. Suggestions include adjusting the penalty calculation to be less severe for operators consolidating validators, possibly using a logarithmic scale related to the maximum effective balance or a fixed scale based on the number of validators.

  • MaxEB Discussion: The MaxEB topic seems less contentious than inclusion lists (IL), but there's uncertainty about proposed changes to slashing penalty calculations. NC is not aware of any new proposals addressing these concerns.

Nazar:

  • Integration of Production Ready Builder: Nazar has been working on integrating a production-ready builder into Lodestar's simulation tests to ensure future stability. This process revealed a bug in the attestation API related to incorrect slot numbers for future epochs, which has since been fixed.

  • Flashbots Builder and MEV Boost Layer: Nazar discovered that the Flashbots builder is designed to interact with consensus clients indirectly through an MEV Boost layer, rather than direct calls from the consensus client to the builder. This requires running the MEV Boost alongside the Flashbots builder in the simulation test environment to facilitate proper communication and payload delivery.

  • Experimentation and Communication: Nazar is experimenting with this setup and has sought clarification from the Flashbots team on Discord. The goal is to fully integrate this flow into Lodestar's simulation tests, enabling comprehensive testing of builder functionality, including scenarios where engine API is down or block proposals are builder-exclusive.

  • Pending PR Review: Nazar highlighted a pending PR (#6507) that has been awaiting review for two weeks. This PR involves moving the withdrawal test to the simulation test structure.

Julien:

  • Docusaurus Migration: Julien has successfully merged the Docusaurus migration into the Lodestar repository. The documentation is now on par with the previous setup, with a few minor adjustments remaining. The new documentation setup will be officially released with the next Lodestar release.

  • Documentation Layout Reconsideration: With the migration complete, Julien suggests it might be a good time to reevaluate the documentation layout, potentially introducing higher-level categories and exploring Docusaurus's capabilities, such as embedding JavaScript directly into the documentation.

  • Light Client Demo Enhancements: Julien fixed several bugs in the light client demo and explored integration with Farcaster. He suggests that the light client demo could be a good match for Farcaster frames, showcasing the light client's capabilities in a webcast environment.

  • P2P Implementation for Light Client: Julien has begun discussions with Cayman about implementing P2P for the light client, aiming to deepen his understanding of P2P and libp2p technologies before Cayman's temporary departure.

  • Pending PR Review Request: Julien also mentioned a pending PR (#6507) that has been awaiting review for two weeks, moving the withdrawal test to the sim test structure.

Matt:

  • BLST-TS PR Completion: Matt has finished addressing all comments from Hubert on the BLST-TS PR (#124). He's ready to merge and was waiting for Hubert to double-check the changes for any final updates.

  • Migration from Rebuild Folder: Matt has been working on migrating from the rebuild folder and removing all SWIG code. The PR for this work is #125. This involved refactoring the build/install system and updating the workflow action. He's aiming to finalize this work, ensuring there are no issues from the refactor. The PR is significant, with 11k lines added and 15k lines removed, but most of the code changes were previously reviewed by @g11tech during the PR process for merging individual pieces to the rebuild branch. The changes in PR #125 are mainly related to repo configuration and the publishing process.

  • Next Steps: Matt plans to activate the tests to check for any typos from the refactor and complete the PR with a full list of changes made during the process.


March 12, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6510

Planning and Discussions

v1.17 Release and Deneb Upgrades

  • v1.17 Released: Thanks to everyone's efforts, v1.17 is out, ensuring readiness for the final Deneb upgrades.
  • Node Upgrades: The infrastructure team has upgraded all nodes. Team members are encouraged to rebase their branches for compatibility with Mainnet feature groups.

Ethereum Protocol Fellows Cohort

  • Upcoming Cohort: Discussion on preparing for the next cohort of Ethereum protocol fellows, with a focus on identifying potential projects for fellows interested in client development.
  • Mentorship Experience: Sharing experiences from previous mentorships, highlighting the low time commitment and the benefits of mentoring, including potential recruitment opportunities.
  • Project Ideas: Considering projects like beacon chain harness for testing or integrating Lodestar with the portal network as potential tasks for fellows. Please highlight any issues that may be useful for EPF fellows to pursue
  • Potential Fellowship Task: Discussion on integrating Lodestar with the portal network as a suitable project for a protocol fellow.
  • Use Case for Archiving Goerli Data: Exploring the idea of using Lodestar and the portal network to archive and distribute historical states of deprecated networks like Goerli.
  • Clarification on Integration: Questions about the specifics of integrating with the portal network, including the mechanism of data provision and the potential benefits of such integration.

Action Items and Considerations

  • Mentorship Participation: Encouragement for team members to consider becoming mentors for the protocol fellowship program.
  • Project Identification: Need to identify and outline specific projects that would be suitable for fellows, ensuring they align with Lodestar's goals and can provide meaningful contributions.
  • Understanding Portal Network Integration: Further discussion required to clarify the technical aspects and potential impact of integrating Lodestar with the portal network.

Beacon Chain Harness for Testing

  • The main topic revolves around the development of a beacon chain harness for testing. NC seeks input on the list of features or requirements for the beacon chain harness, including suggestions for additions or removals.
  • Purpose of the Harness: The harness aims to generate test fixtures for use in testing, rather than dynamically generating fixtures. The consensus leans towards the utility of fixed scenarios over random elements for testing purposes, with Sim tests and other extensive tests covering the randomness aspect.
  • Proposed Scenarios: The discussion suggests having a few specific chain scenarios for testing, such as:
    • A simple linear chain that finalizes.
    • A forky chain that doesn't finalize.
    • Possibly one or two more scenarios with distinct characteristics.
  • Value and Resource Allocation: The team acknowledges the value of embarking on this project. The next steps involve figuring out resource allocation to develop the harness. Discussion to follow on issue: https://github.com/ChainSafe/lodestar/issues/6518

Proposal to Increase Memory Limit: There was a proposal to bump the max old space to 16GB, considering that most staking individuals likely have at least 16GB of RAM. However, concerns were raised about overallocation and its impact on garbage collection cycles and overall system performance.

  • Concerns and Suggestions:
    • Optimal Allocation: A suggestion was made to set the limit to 12GB instead of 16GB to avoid overallocation and potential performance issues.
    • Server Impact: Concerns were raised about setting the limit too high, potentially affecting servers with only 16GB of RAM, leading to crashes or performance degradation.
    • Dynamic Allocation: The idea of dynamically setting the memory limit based on the server's total memory was discussed, with a minimum of 12GB suggested.
    • Default Settings Alignment: It was noted that the default memory limit should align with default settings, and adjustments to settings should be accompanied by corresponding memory limit adjustments.
    • Increasing Default Peers: The discussion also touched on increasing the default peer count to 100, aligning with other clients, and potentially improving network effectiveness.
  • Action Items:
    • Testing with More Peers: The team agreed to test running nodes with more peers (around 100) and observe the impact on memory usage and network effectiveness.
    • Observation and Analysis: Before making any changes to the memory limit, the team decided to analyze memory usage with increased peers to identify any potential bugs or unusual memory consumption patterns.
    • Potential PR: A PR to increase both the memory limit and peer count was considered, with a focus on first understanding the implications of increasing peer count on memory usage.

SSZ Version Release

  • There's an upcoming optimization based on a new SSZ version, including a PR for caching the root for the list type.

Block and Blobs Pulling Technique

  • A new pulling technique for blocks and blobs is being developed, targeting edge cases where the block isn't seen within a specific time window. This technique starts aggressively pulling blobs through request-response when the block is seen, and not all blobs are there. Another scenario covered is when a blob is seen but not the block, indicating the block's presence. A PR for this is almost ready and aims for another RC, though not targeting the mainnet immediately.

v1.18 Release Planning

  • The upcoming v1.18 release will include significant improvements, such as the optimization mentioned above and the proposer boost feature, which is being held off until v1.18 for testing.
  • Another important inclusion will be the addition of point randomness to sameMessage verification, aiming to enhance security.
  • The BLST merge is close to completion and is expected to be part of v1.18, marking it as a substantial release with numerous improvements and new features.

Updates

Matt:

Swig Version of Multiplication

  • Implemented the swig version for multiplication of randomness. The metrics showed good peering, handling twice the amount of traffic due to better peering compared to unstable. However, the attestation queue was significantly longer than expected, raising concerns about some underlying issues needing tuning. It was decided not to rush this into v1.17 and instead aim for inclusion in the next release alongside the BLST PR.

Shuffling and Spec Tests

  • Completed the shuffling work, passing all unit and sim tests, and running on a node. However, spec tests are failing, indicating a potential issue with the test setup rather than the code itself. This led to a pause in this work to focus on the multiplication feature.

BLST TS Refactor

  • Received extensive and valuable feedback from Hubert, with 95 comments on the BLST TS refactor PR. This feedback is currently being addressed, with a high level of confidence in the quality of the refactor due to Hubert's expertise in C and Rust.
  • The primary focus is on finalizing the BLST integration, addressing the remaining feedback, and preparing for merging the rebuild branch into the main. This work is expected to extend beyond the current week, with a target to complete uninterrupted by the next stand up.

Julien:

Docusaurus Migration

  • Worked on migrating to Docusaurus, aiming for a quick merge. The migration is at a stage where it can be merged, with the last comments related to links addressed. This should facilitate easier documentation management and improvements.

Light Client in Alternate Runtimes

  • Made progress with running the light client in alternate runtimes, specifically noting improvements with the latest Lodestar release allowing compatibility with Vite. However, some polyfills related to buffer are still needed. Julien shared ideas for further improvements, aiming for a seamless integration of the light client library without the need for users to worry about polyfills or configurations.

Light Client Demo Enhancements

  • Continued work on enhancing the light client demo, including fixing bugs and conceptualizing improvements. Working on mockups to share with Nazar for collaborative planning on the demo's future direction. Suggested leveraging the demo for showcasing capabilities with Farcaster or Warpcast, noting the potential to engage a vibrant developer community interested in L2 solutions and decentralized applications.

Cayman:

Merkle Tree Experiment with Rust

  • Cayman has been working on an experiment to transition the Merkle tree implementation into Rust. The goal is to see if memory usage can be significantly reduced by utilizing Rust's more efficient memory allocation compared to JavaScript. This could potentially lead to performance improvements.
  • The experiment involves creating Rust objects for branch and leaf nodes of the Merkle tree and wrapping them with JavaScript for necessary interactions. This approach aims to minimize the memory overhead associated with JavaScript objects.
  • The Rust-based implementation has been integrated into the Persistent Merkle Tree library. Cayman is currently addressing bugs related to tree navigation in Rust to avoid unnecessary JavaScript object creation.
  • If successful, this experiment could pave the way for further performance enhancements, including optimized hashing and the possibility of parallelizing hashing operations.

Tuyen:

  • State Cache Clone Method: Introduced a clone method for items retrieved from the state cache to address an issue that led to increased block processing times. This was fixed by implementing a transfer cache option, and the fix was included in the recent release.
  • SSZ Serialization Improvement: Focused on improving SSZ serialization, particularly for the n-historical state work which requires serialization per epoch. Improvements include leveraging cache nodes within the ViewDU for validator cache after epoch transitions and caching balances to enhance block processing and serialization speeds.
  • SSZ Release Incorporation: Plans to incorporate the new SSZ release and use the new type for BeaconState balances to further improve serialization efficiency.
  • New Validators Type: Working on a local branch to create a new validators type within BeaconState to facilitate faster serialization.
  • Epoch Transition Tracking: Submitted a PR to track epoch transitions by reason, which is ready for review.

Nico:

Rewards Calculations Review

  • Nico spent time reviewing NC's branch on attestation rewards to deepen his understanding of rewards calculations in the spec and Lodestar's implementation. He suggests that the branch could benefit from additional reviews before merging.

Token Path Testing Improvement

  • Following a suggestion from the Ethereum Foundation, Nico prepared a PR to add a flag for configuring the token path to facilitate easier testing between clients. The discussion is ongoing, and the change may be included in the v1.18 release.

SSZ Refactor Progress

  • Nico has been working on the SSZ refactor, updating tests, and ensuring they pass. The HTTP client has been made more complex but also more powerful, allowing for per-request settings. This opens up possibilities for future features like fetching blocks from multiple beacon nodes and selecting the most profitable block, which could be attractive to large node operators.

Finality Issue on Goerli and Attestation Processing

  • Investigated a finality issue on Goerli related to a bug in Prysm and checked Lodestar's attestation processing. Lodestar does not seem to have the same bug as Prysm, but Nico identified a minor issue where not all attestations were included in post and app blocks, which is resolved in v1.17.
  • Nico reported that his Goerli node was initially up but started struggling after a restart, facing issues with finding peers and syncing due to the lack of finalized state.
  • Gajinder suggested using a checkpoint sync from a known peer as a workaround for syncing issues during periods of non-finality. However, he noted a problem with the current block fetching algorithm, which could stall sync if too far behind due to peer disconnections.
  • The discussion highlighted difficulties in finding operational nodes to sync from, with most nodes, including ChainSafe's, being taken down. Prysm and Lighthouse might still have nodes running, but the network's state is fragmented with very low participation rates.
  • There was mention of possibly using a historical state for syncing, but the focus shifted towards learning from the situation to improve future responses to similar scenarios. The conversation touched on the importance of having strategies for syncing in testnets and handling long periods without finality.

NC:

  • Beacon Chain Harness: NC mentioned working on the beacon chain harness, with more updates to come.
  • EIP-6110: NC is involved in work related to EIP-6110, contributing to the development and discussions.
  • Support for Naito's PR: Assisted Naito with a PR regarding expected withdrawals, showcasing collaborative efforts within the team.
  • New Specs Review: Engaged in reviewing new specifications released last week concerning Inclusion List (IL) and Max Effective Balance (Max EB). Plans to delve deeper into these topics and potentially start working on a Max EB Proof of Concept in the upcoming week.
  • Engine API for Electra Drafting: Collaborating with Mikhail to draft the engine API for Electra, initially focusing on incorporating EIP-6110 and EIP-7002 into the drafting phase.

Nazar:

  • Continue on working flashbot/builder support to our sim tests. There is some compatibility issues between versions of clients, that is making a bit troublesome.
  • Latest version of Geth is now only supported for the post merge networks, so finding a way to use different Geth versions for different sim tests as we are testing merge scenarios as well.

March 5, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6493

Planning and Discussions

Optimizations Update:

  • Main optimization for inclusion is related to pulling blobs, with a PR ready and under review by Tuyen.
  • Mentioned a PR in SSZ for adding cache, pending review, which will optimize transaction-related routes in the beacon.
  • Running tests on Holesky showed considerable improvement in blob handling.
  • Proposed further optimizations for handling blobs and blocks when critical time has passed in a slot.

Release Planning:

  • Discussed the possibility of doing an RC release today, considering the significant optimization front.
  • Considered targeting Monday for the release to include additional optimizations if possible.
  • Agreed on proceeding with an RC release today, with the possibility of another RC by Friday if further optimizations are completed.
  • Discussed the importance of making clear that the upcoming release is recommended but not mandatory, with Tim Beiko indicating flexibility in updating the blog post with the latest release information.
  • Reviewed the commits since v1.16 and discussed whether to cherry-pick specific optimizations or push new changes from unstable.
  • Preference expressed for bumping to v1.17 to avoid patch releases unless absolutely necessary, noting that most changes are small and not feature-heavy.

Open PRs Discussion

  • Historical State Regen (PR 6033): Waiting on Cayman to address comments from Matt's review. No external blockers.
  • Proposer Boost Reorg: Needs review; recently updated to resolve conflicts and is now ready for review.
  • Late Block Handling: Discussion on the readiness for review, with recent conflict resolution.
  • Proposer Boost Merge Timing Concerns: Suggestion to defer merging Proposer Boost due to sensitive changes, like fork choice modifications, until after v1.17 release to maintain stability.
  • Julian's eth_getBlockByNumber Fix: Awaiting review completion by Nazar, who has started looking into it.
  • Dependabot Updates: Plan to handle asynchronously; may close stale external contributor PRs if no response is received.
  • Use Uint32Array for Shuffling Committees: Requires review.

Support for Docusaurus Migration

  • General Consensus: Strong support for migrating to Docusaurus due to its flexibility, React-based framework, and compatibility with current web technologies.
  • Advantages:
    • Integration capabilities, allowing for direct inclusion of components like the prover as example pages.
    • Enhanced flexibility and developer familiarity due to its React basis.
  • Web3JS Team Experience: Positive feedback on Docusaurus for static documentation, with notes on challenges related to dynamically generated documentation from source code.
  • Focus on Documentation: Acknowledgment that Docusaurus is primarily for documentation, fitting the project's current needs.
  • Webpack Compatibility: Docusaurus's reliance on Webpack 5 is advantageous for working with light clients.

Concerns and Observations

  • Migration Goals: Emphasis on ensuring the migration brings additional value beyond just a platform change, with specific improvements outlined for the documentation.
  • Routing and References: Importance of maintaining stable routing and references to avoid breaking external links to the documentation.

Action Items

  • Outcome-Oriented Migration: Agreement on pursuing tangible benefits through the migration, not just changing platforms for the sake of it.
  • Continued Discussion: Plan to continue the conversation on documentation improvements in an existing issue started by Matthew, focusing on collective effort and planning.
  • Analytics and Insights: Interest in exploring tools or plugins for analytics to understand documentation usage and reader interests, potentially through Docusaurus plugins or Google Analytics.

Lodestar Developer Blog

  • The idea of establishing a developer blog for Lodestar was discussed, aimed at publishing technical posts for contributors and users.
  • Decision: To integrate Lodestar posts within a dedicated section of the ChainSafe blog.
  • Rationale: This approach leverages SEO benefits and avoids starting from scratch, fostering a symbiotic relationship with ChainSafe.

Metrics Discussion

  • During the refactoring process, it was discovered that approximately 80% of collected metrics are not being utilized in dashboards, raising concerns about the performance burden of collecting and scraping unused metrics.
  • The team proposed evaluating which metrics are actually useful by first adding them to dashboards for visualization. This would help determine their utility before deciding to remove any unused metrics.
  • A significant gap identified is the lack of documentation explaining the meaning and use of various metrics. Enhancing documentation could improve understanding and utilization of metrics.
  • A script exists to assert metrics usage, but a more thorough review is needed to decide on the inclusion of metrics in dashboards.
  • The team suggested listing all metrics and creating a checklist to determine their usefulness, with the aim of making informed decisions on which metrics to retain or remove.
  • Sharing knowledge and experiences with metrics, as exemplified by a productive discussion between team members in Ho Chi Minh, was highlighted as immensely valuable. Such exchanges can deepen understanding of how different metrics interrelate and impact the system's performance.

Next Steps for Metrics

  • Organize an issue to list all metrics and facilitate a collaborative review process. This will allow the team to share insights on the relevance and utility of each metric.
  • Continue the conversation on metrics through this organized issue, aiming to enhance both the dashboard and documentation with meaningful and useful metrics.

Updates

Tuyen:

n-Historical State Progress

  • Regen PR Merged: Continued work on the n-historical state with the successful merge of the regen PR.
  • State from Cache and getStateOrBytes API: The next step involves pulling binary data of the upcoming finalized checkpoint, already persisted with the n-historical flag, and persisting that to the state DB.

SSZ Serialization Improvement

  • Current Serialization Time: Noted that serialization of 1.5 million validators takes around seven milliseconds.
  • Optimization Goal: Aiming to reduce serialization time by simplifying the process, which in minimal testing scenarios, reduced the time to 1.5 to 2 milliseconds.
  • Initial Findings: Identified that serialization currently operates based on type structure without caching nodes, despite using ViewDU where fields are cached during state transition. Plans to utilize cached nodes for improved efficiency.

SIMD for sha-256

  • Initial Success: Achieved promising results on personal computer and mainnet node (feature four node) using SIMD (Single Instruction, Multiple Data) for sha-256, aiming for parallel processing improvements.
  • Benchmarking Hesitation: Despite initial success, benchmarking results did not reflect expected improvements, leading to a pause on further development due to lack of confidence in the approach.

NC:

Progress on Reward Endpoints

  • Sync Committee Reward Endpoint: Successfully merged the PR for the sync committee reward endpoint.
  • Attestation Reward Endpoint: Completed coding for the attestation reward endpoint and has the PR ready for review.

EIP 6110 and Effective Balance Increment Issue

  • Investigated the effective balance increment issue related to EIP 6110. Conducted local testing but indicated the need for more extensive testing before opening a PR.

Block Reward Unit Tests Optimization

  • Explored solutions to address the issue of block reward unit tests taking excessive time, as highlighted by Tuyen.
  • Attempted to use the unit test version of Altair states following Tuyen's suggestion but found it insufficient for the requirements, which include needing a mock block that can undergo state transition.
  • Plans to develop a more comprehensive solution, potentially enhancing the test utilities to support this method in the future.

Gajinder:

  • Blobs Pool: Continued work on optimizing the blobs pool.
  • SSZ Merkle Caching: Submitted a PR for caching the SSZ Merkle of lists, which is currently up for review.
  • Inclusion List: Reviewed the ongoing discussions about the inclusion list but did not have significant updates from the previous week.
  • Electra Proposals: Engaged in reading and understanding new proposals for the upcoming Electra upgrade, including Lion's proposal for inactivity score improvements among other topics.

Nico:

  • SSZ Refactor Cleanup: Focused on cleaning up the SSZ refactor branch, preparing it for further development.
  • PR Reviews: Dedicated time to reviewing PRs from contributors and other team members. Plans to review NC's PR and provide feedback.
  • SSZ Refactor Progress: Awaiting the merge of NC's PR to perform another rebase on the SSZ refactor branch. Anticipates addressing and resolving numerous build issues post-rebase.
  • Focus: The main focus remains on advancing the SSZ refactor to a buildable state, navigating through the challenges of integrating changes and ensuring compatibility.

Cayman:

Prysm's Hash Tree Library for JavaScript

  • New Repository: Created a new repository named hashtag Js to make Prysm's hash tree library accessible in JavaScript.
  • CI Issues: Encountered difficulties with CI builds, particularly for Windows and Mac OS, where builds are failing.
  • Project Details: The project is a straightforward napi-rs project, involving a C program built through Cargo's build process, with a Rust wrapper around the C library.
  • Performance: The performance of the library is reported to be very good.
  • PR Reviews: Continued reviewing PRs and is available for assistance. Urges team members to reach out if needed.
  • Upcoming Leave: Cayman announced he will be getting married on the 23rd and will be off for several weeks, approximately three to four weeks, starting from the 20th.

Nazar:

  • Moved the withdraw interop test to the SIM test suite. The PR is now open and ready for review.
  • Working on a PR related to the builder, expected to be opened by Tuesday. This effort aims to transition most existing sim tests within the beacon node repository to the sim test suite and some to end-to-end tests, concluding this particular chapter of work.

UI/UX for Light Client Demo

  • Engaged in brainstorming sessions to enhance the UI/UX for the light client demo. The goal is to make the demo more attractive and user-friendly with improvements such as sync committee animations and better user experience designs.
  • Decided against merging the light client demo with the prover demo into a single page, opting to keep them separate for clarity and focus.
  • Plans to implement suggestions from Julian to improve the demo's UX by simplifying configuration and setting default values, making it more accessible to users.
  • Future work will focus on refining the light client demo and developing a separate prover demo.

Julien:

  • Creation and Management: Spent time creating "good first issues" for external contributors. Noticed an increase in external contributions, suggesting a need for a strategy to manage these issues effectively, considering the team's overhead in overseeing them.
  • Long-term Engagement: Emphasized the importance of engaging contributors for long-term involvement rather than one-off contributions.

Light Client Library Testing

  • Environment Testing: Continued testing the light client library across various environments, including React Native and Cloudflare Workers, noting compatibility issues.
  • Docusaurus Testing: Tested with Docusaurus, based on Webpack 5, showing potential compatibility. Optimistic about making the light client work with Docusaurus due to its Webpack 5 basis.

Light Client Demo Improvements

  • UX Improvements: Worked on fixing longstanding issues in the light client demo and enhancing user experience. Acknowledged the demo's potential but noted its current UX complexity.
  • Collaboration with Nazar: Plans to collaborate with Nazar on refining the demo, focusing on making it more user-friendly and showcasing the value of the light client.

Matt:

  • Shuffling Refactor Issue Diagnosis: Identified issues with sim tests failing, suspecting incorrect shuffling cache assignments. Currently investigating the cause, which seems related to cache management rather than the shuffling algorithm itself.
  • Code Review with Hubert: Received valuable feedback from Hubert, a C developer, on the BLS refactor. Focused on addressing comments related to buffer safety and other C-specific concerns to enhance code quality.
  • BLS Refactor Progress: Worked on incorporating Hubert's feedback to prepare the BLS refactor for merging. Aims to integrate these changes into the main branch, eliminating the need for a separate rebuild folder and dual testing.

February 27, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6467

Planning and Discussions

Update on Optimizations and Patches for Deneb

  • Optimizations:
    • Handling big blocks by changing the SSZ library to not cache routes for composite lists, aiming to optimize for full blocks in memory.
    • Implementing cache route optimizations for transactions and withdrawals to avoid recalculating roots.
    • Considering extending the publish API for validators to include the block root they signed over, potentially saving recalculations.
  • Aggressive Pull of Blobs: Addressing a ~2% blob drop rate on Holesky by proposing an aggressive pull strategy for blobs to improve efficiency.

Patch Release Considerations:

  • Debated whether to include the aggressive block pull optimization in the next patch release or to wait.
  • Gajinder suggested focusing first on the blobs part, as big blocks are unlikely on mainnet due to optimizations already merged.

Release Timing:

  • Discussed the possibility of waiting for the aggressive block pull strategy to be ready before releasing the next patch, targeting completion by Friday before the fork date on the 13th.

Strategy Going Forward:

  • The team leaned towards waiting for the next set of optimizations before pushing out a new release, aiming to avoid confusion before the hard fork.
  • Considered following a similar approach to Geth, offering optional upgrades for optimizations that are unlikely to impact mainnet performance.

Light Client Roadmap for Electra

  • Sync Committee Slashing
    • The team acknowledged the importance of introducing sync committee slashing to enhance trust and security in the light client infrastructure. This addition is seen as crucial for making the client infrastructure more reliable by penalizing misbehavior.
  • Light Client Data Backfill
    • There was a debate on the necessity and complexity of adding light client data backfill to the protocol. The current scheme does not support backfilling data efficiently, which is a limitation that the proposal aims to address.
  • Security and Use Cases
    • Questions were raised about the security benefits of adding slashing and how it would enable new use cases, especially considering its potentially low impact on securing high-value applications like bridges.
  • Canonical Data and Weak Subjectivity
    • The discussion touched on the need for light clients to sync from periods before the weak subjectivity period, questioning the practicality and security implications of such a feature.
  • Meeting with Etan
    • The team considered organizing a breakout session with Etan to address questions and gain further context on the proposal, potentially leading to a monthly light client-focused meeting.

Stale PRs Cleanup

  • Suggested cleaning up old, stale PRs across repositories, noting that some repos are neglected with a lot of stale content.
  • Discussed establishing basic guidelines on handling robot dependency upgrade PRs.
  • Consideration on whether to accept automatic dependency upgrade PRs, especially noting that most stale PRs are related to these.
  • Observed that many stale PRs are for dev dependencies, which are less critical, especially if they involve indirect dependencies of tools like webpack.
  • Suggested that accepting PRs for dev dependencies might not significantly impact security and could be a straightforward way to manage some of the backlog.
  • Cayman proposed tackling the cleanup process asynchronously, possibly in a one-on-one session, to efficiently address and close unnecessary PRs.
  • Recommended creating an issue with a checklist of PRs to be reviewed and closed, allowing for a systematic approach to the cleanup process.

Move ENR-app to discv5

  • Action approved to move it into the discv5 monorepo

Removing Support for Older Node Versions

  • Existing practice is to remove support for older Node.js versions when a new LTS version is adopted.
  • Suggested supporting one major even-numbered version back, implying Node.js 18 should not be dropped until version 22 is released.
  • Consideration of when support for version 18.17 was inadvertently broken, despite being listed as supported in package.json.
  • Proposed running unit tests against two Node.js versions to ensure compatibility and facilitate a rolling update strategy.
  • Discussed the impact on CI time, with a suggestion to perform these checks with each release rather than continuously, to avoid prolonging CI processes.
  • Mentioned that unit tests are relatively fast and wouldn't significantly add to total CI time.
  • Emphasized the importance of identifying the specific commit or PR that breaks compatibility with an older Node.js version for accurate release notes.

Action Items for deprecating Node.js support

  • Update the engine field in package.json to warn users of incompatible Node.js versions during installation.
  • Plan to prepare for the release of Node.js 22, ensuring continued support for Node.js 20 until then.
  • A PR will be created to test the proposed strategy on CI and evaluate the impact on test duration.
  • Prepare for the upcoming Node.js 22 release by ensuring compatibility with Node.js 20 is maintained.

Updates

Matt:

  • Implemented a multiply by function to address a security vulnerability identified in the crypto library.
  • Successfully integrated the fix into BLST and BLS, and subsequently into Lodestar.
  • The implementation and performance have exceeded initial expectations, with no performance issues reported.
  • Deployed the update approximately 6 hours prior to the standup, noting that it's performing better than the unstable version.
  • Awaiting a full day's metrics to comprehensively assess the impact and stability of the changes.
  • Finalizing the last unit tests and integrating metrics into the shuffling PR, with completion expected by tomorrow.
  • Plans to create a PR to move the rebuild branch onto the main in the block repository, indicating readiness for broader use.

Tuyen:

Exploration of SIMD with as-sha256

  • Investigated the use of SIMD (Single Instruction, Multiple Data) to enhance the performance of sha256 by processing multiple inputs in parallel.
  • Achieved a performance improvement: 25% better on a MacBook and 70-80% better on a server.

Proposed Improvements

  • Identified potential improvements to the digest 64 process by:
    • Chaining the extended block with the main loop inside a hash block.
    • Using Uint8Array.slice instead of getting a subarray to separate data from Uint8Array.
  • These improvements could result in a 60-70% performance enhancement.

Julien:

Light Client Libraries in Browsers

  • Explored integrating light client libraries with modern bundlers like Vite and Parcel for browser usage.
  • Faced challenges making it work seamlessly, indicating the process is not trivial.
  • Created issues with ideas to outline potential improvements.

Dependency Cleanup

  • Identified the need for cleaning up dependencies, either due to lack of maintenance or minimal value to the project.
  • Initiated some pull requests to remove such dependencies, suggesting these tasks could be good first issues for newcomers.

Light Client and Prover Learning

  • Continued learning about light client and prover functionalities.
  • Addressed a prover issue and is in the process of responding to comments from Nazar to determine the best solutions.

Cayman:

  • Committees Optimization PR: Introduced an experiment converting committees into a single uint32 array for slicing, achieving a 5% reduction in total running memory.
  • Snappy for Gossip Messages PR: Implemented encoding and decoding of gossip messages using Snappy for larger payloads and snappy JS for smaller payloads, aiming to deploy to a feature branch for testing.
  • Explored using shared array buffers for data transfer from the networking thread to the main thread to reduce event handling. The initial deployment showed poor performance, with ongoing investigation into the cause.
  • Discussed previous work on optimizing SHA-256, showing potential for significant performance improvements based on benchmarks against Prysm's implementation.
  • Open to further developing this work for production use, noting Prysm's capability to hash full state in milliseconds.

Nico:

Beacon API Release and Updates

  • Announced the release of beacon APIs and updated Julian's branch to align with the latest spec, enabling merging and testing against updated specifications.
  • Confirmed that all desired fixes are now incorporated, allowing for the removal of workarounds previously implemented in API tests.

SSZ Refactor Progress

  • Completed the translation of all routes for the SSZ refactor, including the previously missing key manager.
  • Working on refactoring the spec tests and planning to rebase the refactor branch again.
  • Mentioned the potential need to merge the rewards APIs (pending confirmation from NC) but indicated no rush, as refactoring for those routes can be addressed later.

NC:

Beacon API Development

  • Sync Committee Reward Endpoints: Submitted a PR for review, with Nico providing extensive feedback. Plans to address these comments.
  • Attestations Reward Endpoint: Initiated a draft PR for the last reward endpoint related to attestations, currently halfway through development.
  • Issue 6110 on Effective Balance Increments: Investigated the effective balance increments issue raised by Gajinder, with an initial solution in mind. Coding and testing of the solution are pending.
  • Inclusion List PoC for ACDC: Participated in last week's ACDC discussion where Lion highlighted the need for an Inclusion List Proof of Concept (PoC).
  • Engaged in Discord channel discussions related to the PoC, noting that immediate development has not yet commenced.

Gajinder:

Optimizations and Protocol Enhancements

  • Focused on optimizations and delving into ePBS (Enshrined Proposer Builder Separation), IL (Inclusion Lists), and pre-confirmations, updating knowledge with the latest information.
  • Aimed to implement aggressive blob pull and complete the 7002 implementation for IL support.

Questions and Concerns

  • Raised pertinent questions regarding the design and rationale behind forward Inclusion Lists (IL), particularly questioning the emphasis on optimizing builder payouts over addressing censorship concerns.
  • Expressed reservations about the forwardness of IL, preferring a model where proposers can directly inform builders of their CR (Censorship Resistance) list or inclusion list for the same slot.
  • Noted the absence of clear explanations for discarding same-slot IL proposals in favor of forward IL, aiming to fully understand the proposed design's alignment with ePBS.

February 21, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6434

Planning and Discussions

v1.16 Release Discussion

  • Progress on v1.16: Discussion on the current status and remaining tasks for the v1.16 release.
    • A dependency upgrade PR has been merged and included in the RC, specifically the disc v5 update, which is expected to fix previously identified errors.
    • Several PRs tagged for 1.16 are mostly merged, with a few remaining open for fixes and enhancements, including typo fixes and adding logs for HTTP retries.
    • The team plans to merge these PRs promptly to cut another RC from the unstable branch, focusing on small changes like CLI flags and logs.

Performance and Memory Usage

  • Performance Issues: A performance issue was noted in Holesky, potentially due to a recent increase in validators. The team plans to confirm this on mainnet.
  • Memory Usage: Increased RSS memory usage observed, up to 9GB on beta-mainnet server. The team is considering this in the context of the 1.16 release and potential impacts.

Release Strategy and Versioning

  • Handling Breaking Changes: Discussion on managing breaking changes, especially in CLI flags, and the versioning scheme for customer-facing products.
    • The team considers major version bumps or separating the Prover from the monorepo to avoid versioning conflicts.
    • The possibility of independent versioning for specific packages within the monorepo was discussed, with a focus on the implications for the Prover and other unique dependencies.
  • Contingency Plans: Should significant performance issues be confirmed in v1.16, the team discussed contingency plans, including:
    • Potential for a 1.15.2 Release: If necessary, a patch release (1.15.2) could be considered to incorporate critical commits without the changes introduced in 1.15 that may be causing issues. However, this approach is complicated by the fact that 1.15.1 also exhibited similar problems.
    • Reverting Changes in a 1.16.1 Release: Another approach could involve releasing 1.16.1, reverting problematic changes from 1.15, and including only the Deneb config. This would allow for a focused investigation into the cause of the issues, to be addressed in a subsequent 16.2 release.

Project Updates

  • Shuffle Refactor: Substantial completion of the shuffle refactor, with ongoing work on metrics for cache hits and misses.
  • BLST Library: Approval and positive feedback on the BLST library work, with plans to address a security bug identified by the Lighthouse team related to message optimization.
  • Blinding Blocks: Progress on rebasing and cleaning up the blinding blocks PR, with plans to finalize and address any outstanding issues.

Updates

Matt:

  • Shuffle Refactor: The shuffle refactor is substantially complete, with attention now turning to metrics for cache shifts and cache misses. A slight refactor may be needed to integrate these metrics fully.

  • Collaboration with Julian: Discussed the "past monster" issue with Julian, with plans to solidify the approach used.

  • BLST Approval: Received approval on BLST work, which is now looking very promising. There's excitement about the progress and the quality of the work.

  • Security Bug Mitigation: A security bug identified by the Lighthouse team is being addressed. Tuyen has developed a proof of concept for mitigation, and Matt plans to implement a function to address this.

  • Cleanup and Publication: Additional cleanup is required before the changes can be merged into the main branch and published. Progress is close to completion, with optimism about nearing the finish line.

  • Blinding Blocks and PR Rebasing: The rebasing for blinding blocks has been cleaned up after discussions with Julian. Matt plans to finalize this work and then consult with Gajinder about an issue related to the sim merge test. The test failed due to a missing API function (get payload body) in the execution layer container, which needs to be addressed either by updating the container or adjusting the test strategy.

  • Looking Ahead: The focus for the upcoming week includes finalizing the blinding blocks work and addressing the sim merge test issue, with a positive outlook on the progress made and the tasks ahead.

Cayman:

  • Tree Hashing Exploration: Investigated alternative methods for tree hashing in Lodestar, focusing on memory efficiency and performance improvements. Experimented with different SHA implementations and storage methods for hashes.

  • HackMD Documentation: Shared findings and exploratory work in a HackMD document, accessible here.

  • Exploration Outcomes:

    • Initial attempts using NAPI-RS and exploring various storage and implementation strategies did not yield memory or performance improvements over the current setup.
    • AssemblyScript SHA-256 remains the most efficient for hashing two 32-byte arrays compared to Rust implementations and a ported version of AssemblyScript SHA-256 into Rust.
    • The hash tree library from Prysm, designed for bulk hashing, showed potential for speed improvements when hashing large contiguous data arrays.
  • Potential for Bulk Hashing Optimization: Identified a possible optimization for parts of the tree that change entirely at once (e.g., balances) by using bulk hashing techniques, which could significantly improve performance.

  • Memory Usage Comparison: Noted that Lighthouse also consumes around 8 GB on Holesky, suggesting that Lodestar's memory usage is competitive. Further exploration into Rust and NAPI wrapping indicated that memory efficiency challenges are partly due to the inherent overhead of Node.js structures.

  • Next Steps: Cayman plans to continue exploring hashing optimizations and invites team members interested in this area to join the discussion in the developer channel.

NC:

  • Research Catch-up: Focused on catching up with discussions on Inclusion Lists (IL), Enshrined Proposer Builder Separation (ePBS), and max effective balance (MaxEB) from the past two weeks.
  • PR Merges: Thanked Gajinder for merging the 6110 PR into the Electra branch and outlined plans for follow-up PRs related to 6110.
  • Notify New Payload: Plans to start work on the early call for the notify new payload feature proposed by Tuyen.
  • Optimization of Effective Balance Array: Gajinder requested NC to look into optimizing the effective balance array fixed in the 6110 PR, which currently isn't optimized for memory.
  • Notify New Payload Latency: Discussed the latency involved in calling notify new payload, with observations indicating a 10-20 millisecond latency upon block reception. The discussion extended into the efficiency of this process and comparisons with Nethermind's response times.

Nethermind Latency Observations:

  • Initial Observations: Noted that Nethermind quickly responds to payloads it has already seen, but exhibits a significant latency of approximately 300 milliseconds for new payloads. This latency was identified through detailed analysis of the time from making a call to receiving the first streaming response.

  • Investigation Findings:

    • For already seen payloads, Nethermind's response time is very fast, approximately 10-20 milliseconds, indicating efficient handling of known data.
    • For new payloads, the latency before Nethermind starts sending data back is around 300 milliseconds. This delay aligns with Nethermind's logs, which show a 200-millisecond gap before acknowledging receipt of a new block.
  • Analysis of Latency Causes:

    • The latency for new payloads is not attributed to Lodestar's network thread being busy or any inefficiencies in Lodestar's handling of network responses.
    • The delay primarily occurs on Nethermind's side, from the moment Lodestar sends out the call to when Nethermind begins to respond. This was corroborated by matching timings with Nethermind's logs.
  • Communication Efficiency:

    • For repeated blocks, the observed low latency demonstrates the actual communication efficiency between Lodestar and Nethermind, excluding block processing time.
    • The increased latency for new payloads suggests a delay on Nethermind's part in processing and responding to new block information.

Implications and Next Steps

  • Optimization Focus: The findings suggest that while Lodestar's notify new payload calls are made efficiently, the optimization focus should perhaps shift towards understanding and reducing the latency of payload processing on Nethermind's side.

  • Further Analysis: Additional comparisons with other clients like Geth, and further metrics implementation on Lodestar's side, could help pinpoint optimization opportunities and improve overall response times for new payloads.

  • Comparison with Other Clients: Highlighted the need to compare Lodestar's performance with other clients, particularly focusing on the latency of payload processing and the efficiency of notify new payload calls.

Nico:

  • Issue Review for Release: Reviewed all issues tagged for the upcoming release, including those assigned during the retreat. Picked up a variety of tasks, focusing on smaller, scattered issues.

  • PR Status: Noted that some PRs are still open. While Nazar reviewed some, they are not critical for merging in the 1.16 release, except for two tagged PRs.

  • Beacon API Spec Review: Conducted a review of the beacon API specification to finalize adjustments before the release. The focus was on cleaning up details to prepare for the 3.0 release.

  • API Deprecation and Release Planning: Discussed the approach to API deprecation, noting that newly deprecated APIs from Capella will not be removed until after one more hard fork, adhering to the consensus on API removal timing. This strategy aims to ensure stability and backward compatibility.

  • Release Timeline: Anticipated cutting the 3.0 release possibly within the week, pending final reviews and adjustments.

  • Miscellaneous Fixes: Engaged in fixing minor issues across various aspects of the project, contributing to overall improvements and readiness for the upcoming release.

Tuyen:

  • Gossipsub Migration: Worked on migrating protobufs to protons for gossipsub during vacation. The work has been merged.
  • N-Historical State PR: Plans to divide the large n-historical state PR into smaller PRs for easier review. The first one focuses on regeneration logic and when to use the API after state cache.
  • Bug Fixes: Addressed a bug seen in Sepolia related to block head size and added the error fix in the PR.
  • BLS as a Map Update: Compared Matt's branch for BLS implementation and identified the need for randomizing factors, which, however, doubles the processing time. Plans to refactor for cleaner implementation and maintain two versions for optimization.
  • Performance Concerns: Noted the traffic through the worker boundary and the double queuing of attestations as areas for potential optimization. Also highlighted the need to prepare for EIP 7549 changes affecting attestation data.

Discussion Highlights

  • EIP 7549 Changes: Discussed the ongoing discussions around EIP 7549, particularly moving the attestation index out of the attestation data, and how it might affect Lodestar's optimizations.
  • Latency Between Main and Worker Threads: Raised concerns about the 20-millisecond latency from main to worker threads, suggesting it adds significant delay. Discussed exploring other worker libraries that use Atomics for performance optimization.
  • Optimization Strategies: Agreed on the importance of optimizing the latency between main and worker threads and considered prioritizing workers CPU-wise. Also discussed the need for breaking down the BLS work into multiple PRs for clearer performance impact assessment.

Gajinder:

  • 6110 PR for Electra: Focused on making the 6110 PR mergeable into the Electra branch by resolving issues, ensuring test files pass, and overall preparation for integration.

  • Deneb Related PRs: Worked on PRs related to the upcoming Deneb upgrade, alongside extensive research on Enshrined Proposer Builder Separation (EPBS), Inclusion Lists (ILs), and pre-confirmations.

  • 7002 PR and Devnet Preparations: Aiming to address EIP-7002 PR next, to ensure the Electra branch is ready for any upcoming Devnet activities. This involves achieving parity with the latest consensus specs, particularly 6110 and 7002 merges.

  • EPBS and ILs in Electra: Advocated for starting EPBS trials with Prysm, despite its likely exclusion from the Electra upgrade due to scope considerations. However, shifted stance to support the inclusion of ILs in Electra, citing benefits for censorship resistance and pre-confirmations.

  • MaxEB Considerations: Discussed the potential inclusion of Max Effective Balance (MaxEB) in Electra, noting its relative ease of incorporation if decided.


January 23, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6313

Planning and Discussions

Sepolia/Holesky Hard Fork Release:

  • Focus on finalizing the release for Holsky and Sepolia hard fork.
  • Also Chiado hard fork ready
  • Deploying v1.15.0-rc.0 after Chiado merge is complete and ready for inclusion
  • Block production update PR not required for v1.15, needs reviews

Memory Limit Increase Discussion:

  • Debating the need to increase the default memory limit. Agreement on raising the limit to 8GB to ensure stability during periods of non-finality.
  • Decision to set the limit commensurate with Holesky's requirements and communicate it as a risk mitigation step.

Invalid Block Errors from Execution

  • A community member reported an PROTO_ARRAY_INVALID_LVH_EXECUTION_RESPONSE in Lodestar during the Nethermind consensus issue. The team sought to understand the nature of this irrecoverable error and communicate it effectively to the community.

Explanation of the Error:

  • Block Validation Process: When Lodestar sends a block to the execution engine (EL), it expects a response on the block's validity. The EL indicates whether the block is valid or invalid and provides the latest valid hash in its canonical chain.
  • Error Propagation: If a block is marked invalid, this status is propagated up to the latest world hash in the fork choice. An inconsistency arises if a block previously marked valid is later deemed invalid (or vice versa).
  • Fork Choice Poisoning: This inconsistency poisons the fork choice, creating uncertainty about the validity of blocks. Since it's challenging to revert these changes, Lodestar is designed to shut down and restart in such cases.
  • Dependency on EL Client: If the EL client continues to exhibit the same behavior, the issue recurs, leading to repeated fork choice poisoning. The problem is essentially irrecoverable without human intervention from a fix in the EL client.

Proposed Actions:

  • Error Logging Improvement: Enhance error logging to provide clearer information about the issue, including specifics about the blocks causing the problem.
  • Communication Strategy: Explain to users that the EL client has changed its decision on a block's validity, leading to a critical error in Lodestar's fork choice.
  • Documentation: Develop a handbook detailing scenarios where specific error codes may arise, aiding in user understanding and troubleshooting.

Eth1Data Deposits for Block Production

  • The current process involves attempting to retrieve and process ETH1 deposits at the time of forming the block body. This approach has been identified as problematic and is likely contributing to the missed block proposal issue.
  • Pre-Triggering Deposit Processing: The team suggests that the process of retrieving and processing ETH1 deposits should be triggered beforehand, rather than during the block proposal phase.
  • Data Caching: It is recommended that the relevant data be cached and prepared in advance, rather than being processed in real-time during block proposal.
  • Fallback Mechanism: In cases where pre-triggering is not feasible or the data is not ready, the system should default to using whatever data aligns with the current last block to run the proposal.
  • While the root cause of the delay in processing ETH1 data and deposits is still under investigation, implementing the proposed changes to the deposit retrieval process is considered a priority, even though the edge case is quite rare.

Updates

NC:

  • Spent time reviewing Electra EIP candidates to form opinions on each EIP.
  • Conducted a detailed review of the MaxEB (Maximum Effective Balance) specifications, preparing for potential implementation.
  • Continued follow-up on the existing PR for block reward endpoints, addressing comments and refining the code.
  • Investigating and addressing comments related to the proposal boost feature.
  • Aim to stabilize the code for block reward endpoints and prepare the synchronization from the endpoint PR for the upcoming week.
  • Begin implementation of the attestation reward endpoints.

Gajinder:

  • Reviewing PRs and EIPs for Electra inclusion

Tuyen:

  • Completed the last part of the n-historical state work related to the buffer pool.
  • Rebased the n-historical state against stable and addressed the invalid state root issue.
  • Updated the cache to always return the current version of the state to prevent issues if the state is mutated. No issues observed after four days of testing.
  • Implemented a debug version to persist block states at the last slot of an epoch and the dialed state to the next epoch for further investigation if the issue recurs.
  • Addressed an issue with an unknown client sending bad SSZ responses regarding metadata. Merged a fix to block unknown clients.
  • Created an issue to downscore peers that send SSZ errors, which Matt is reviewing.
  • Investigated an issue related to block production and ETH1 deposits. Created an issue for further analysis as the problem is rare.
  • Noticed frequent delays in attestations taking up to one second. Successfully reproduced the issue in PerfTest and working on a fix.

Nico:

  • Addressed issues where errors were thrown on the API for already known attestations. Updated the system to ignore these based on the spec and avoid 500 responses. This update should help both in DVT setup and for those running fallback nodes.
  • Noticed some configuration parameters were set as constants. Reviewed and aligned presets and configs with the latest spec.
  • Updated networking terms to support the Gnosis Chiado fork, which recently moved certain values to the config. This change aligns with other clients like Teku.
  • Discussed the possibility of moving more parameters to configs for customization and closer adherence to the spec, though not considered urgent.
  • Investigated better .lock file handling methods to address issues where power loss or unexpected shutdowns prevent restarting a valid data client. Exploring libraries like level DB for smarter log file management.
  • Delayed rebasing the SSZ branch due to many open branches modifying API stuff. Suggested merging these first.
  • Conducted reviews on various open branches and planned to review the rewards API after the meeting.

Cayman:

  • Made small PRs in the libp2p monorepo and Ethereum.js, focusing on cleaning up discv5.
  • Addressed an issue reported by Nico regarding unhandled promise rejections in discv5.
  • Opened a PR in the Discv5 repo that refactors callback handling and adds try-catch blocks to prevent unhandled promise rejections. The PR is ready for review.
  • Worked on the add historical state regen PR.
  • Integrated the newly published classic level and added panels in a new dashboard for timing analysis.
  • Deployed the feature to feat3 server group for testing. It functions but is slow, taking about two minutes to retrieve a historical state.
  • The PR is marked as ready for review, though it requires further investigation to improve performance.
  • The slow retrieval time for historical states is a concern. The issue might be related to the storage frequency of historical states (every ~thousand epochs), potentially leading to numerous epoch transitions during retrieval.
  • Metrics did not clearly indicate the cause of the delay, suggesting the need for more in-depth investigation.

Julien:

  • Continuing work on cleaning the beacon API repository. Nearing completion for a new release, though unclear about the release schedule and management in the repo. Seeking assistance for the release process.
  • Implemented beacon API-related updates in Lodestar, including:
    • Filtering on the blob sidecar, addressing a missing feature in recent beacon APIs.
    • Ensuring the light client event has the proper shape and adding version information.
    • Investigating the addition of ERC55 support to execution addresses, though decision-making on adding dependencies and addressing security concerns is needed.
  • Proposed adding test cases to the beacon API repository using OpenAPI syntax.
  • The goal is to integrate automated tests directly into the specification, ensuring all implementers share the same test cases and have a clearer understanding of API intentions.
  • Initiated discussions with Nazar to understand and potentially contribute to the project, aligning with the team's goals for the year.

Nazar:

  • Completed the last PR for cleaning up the test structure.
  • Transitioned all integration tests, sim tests, spec tests, etc., to use Vitest, moving away from Mocha and other dependencies.
  • The only remaining area using Mocha is the performance test. Plans to contribute to a third-party library to make it test runner agnostic, but this is not urgent.
  • Sim test issues have been resolved, leading to stable test outcomes. Encourages team members to investigate any failures and suggest improvements.
  • Working on making the same merge tests more stable and relevant. Plans to review each test case, moving some to sim tests and fixing others.
  • Engaging in discussions and work on the block production update PR.
  • Preparing to open PRs for each case in the same merge tests for easier discussion and resolution.
  • Will open the first PR for sim merge tests after the call.
  • Noticed the issue created by Tuyen regarding performance on ETH1 data and deposits. Previously worked on a dashboard that generated charts for this data. Plans to investigate further if the issue is not already assigned.

January 16, 2024 Planning and Standup Meeting

Agenda: https://github.com/ChainSafe/lodestar/discussions/6274

Planning and Discussions

Version 1.14 Release

  • Completion of v1.14 release.
  • Pending announcement and documentation update needed for last-minute builder boost factor inclusions.

Planning for v1.15

  • Suggestions for v1.15 release contributions.
  • Aim to release in about two weeks.
  • Tag potential items for v1.15 and discuss asynchronously if needed.

Discussion on Builder Selection Query Parameter

Context and Current Implementation

  • The team discussed the current use of the builder selection query parameter in beacon nodes.
  • This parameter enforces old behavior and works alongside the builder boost parameter for selecting builders.
  • It's particularly relevant when no viable builder block is available, allowing the system to error out instead of providing an execution block.

Debate on Removal

  • The main focus was whether to keep or remove this additional parameter.
  • Given the new builder boost functionality and spec compliance, the team deliberated on its continued necessity and utility.

Use Cases and Impact on DVT

  • Examined specific use cases for DVT (Distributed Validator Technology).
  • Discussed potential shifts in handling execution blocks, especially when a builder block request is not viable.
  • Middleware solutions like Charon or SSV nodes might need to locally reject such blocks.

Exception Scenarios

  • The parameter currently allows for two exception scenarios:
    1. Execution only block: Errors if there's a viable builder block but no viable execution block.
    2. Builder only block: Errors if there's a viable execution block but no viable builder only block.

Specification Compliance

  • Noted that with default builder selection (max profit and set builder boost parameters), the system remains spec compliant.

Conclusion and Next Steps

  • No final decision was made on the removal of the builder selection query parameter.
  • Highlighted the need for further evaluation of its impact in light of new functionalities and spec compliance.

The discussion reflects the team's commitment to balancing system functionality with evolving requirements and specifications, ensuring alignment with overall goals and user needs.

Handling Builder Boost Zero Parameter

Overview of Builder Boost Zero Parameter

  • Focused on the implementation specifics of the builder boost zero parameter in beacon node operation.
  • Key in determining how the node selects blocks under specific conditions.

Implementation Challenges

  • Discussed the beacon node's behavior when a validator passes a boost factor of zero.
  • This scenario implies the node should always select the local block, effectively skipping value comparison with builder blocks.

Current System Behavior

  • Currently, the beacon node waits for the resolution of two promises (local and builder) before making a comparison.
  • With a builder boost of zero, this comparison becomes redundant as the local block would always be preferred.

Proposal for Early Resolution

  • Suggestion to modify the system for an early resolution in cases of builder boost zero.
  • This would involve the node producing a block immediately once the local promise resolves, without waiting for the builder promise.

Concerns and Considerations

  • Discussed the efficiency of the current system for this scenario and the need for a new approach.
  • Raised concerns about potential complications in introducing new logic and maintaining system simplicity.

Decision and Next Steps

  • Agreed on further investigation and potential iterations on the race helper function to efficiently handle the builder boost zero parameter.
  • Aim to refine the system to accommodate this scenario, ensuring overall functionality and system integrity.

Potential Use of RxJS

  • Suggestion to use RxJS for timing issues and race conditions in builder flow.
  • Decision to focus on existing solutions for efficiency and maintainability.

Decision on Builder Selection Parameter

  • Agreement to keep the builder selection parameter for now.
  • Consideration for retiring it if it becomes a hindrance or redundant.

Further Iterations on Race Helper Function

  • Agreement to iterate further on the race helper function.
  • Focus on efficient handling of builder boost zero parameter.

Status of Historical State Regeneration

  • Completed implementation of the historical state regeneration feature.
  • Currently blocked by an issue with Level DB.
  • A lack of response on a blocking pull request (PR) submitted to Level DB.
  • No communication from the Level DB team for a month.
  • Intend to reach out to the Level DB team again, informing them of the decision to fork.
  • The approach aims to unblock the impasse and progress with the historical state regeneration feature.

Discussion on Increasing Peer Count to 100

  • Question raised about whether the issue in gossipsub version 1.10 has resolved the libp2p peer count.
  • The problem involved having more peers than necessary, acting as a blocker for the increase in peer count.
  • Peer Rotation in libp2p 1.0: Observed smaller peer rotation in the libp2p 1.0 branch.
  • Quicker peer rotation previously led to spikes in peer count.
  • Newer branch showed longer peer connections with fewer disconnects and reconnects.
  • Suggested testing the increase to 100 peers on the new branch.
  • Noted that the branch is nearly ready for merging, with only minor issues left.
  • Mentioned performance problems in Lighthouse with flood publish when increasing peer count.
  • Questioned if Lighthouse has batch publishing, which might mitigate issues.
  • Agreed that batch publishing should bring improvements.
  • Next Steps: Decided to first proceed with the libp2p upgrade, followed by rebasing and further testing.

Discussion on the self-hosted runner

  • Identified a problem with unknown blocksync simulation tests, being addressed with Gajinder.
  • Concerns about whether issues with sim tests were related to unknown block sync.
  • Regardless of immediate fixes to stabilizing sim tests, there was agreement to change the self-hosted runner.
  • If not needed, the expenses for the machine might be reconsidered.
  • Suggested running benchmarks on the same machine.

Discussion on Validator Config Documentation

  • Proposed removing excessive content and directing users to official sources.
  • Additional Feedback
    • Direct users to Ethereum Foundation's staking website for initial steps.
    • Guide users to use Lodestar after completing initial steps.
    • Add the mainnet beacon contract address.
    • Include the launchpad address for better accessibility and awareness.
  • General agreement on updating the documentation to make it more streamlined and relevant.
  • Acknowledged the need to link important resources and update terminology.

Updates

Julien:

  • Julien continued work on the Beacon API, focusing on the Beacon API repo itself.
  • Noted that error handling in the Beacon API is still a bit confusing and needs attention.
  • Understanding**: Proposed to migrate examples and tests from Lodestar repo into the spec directly.
  • Goal to ensure consistent understanding and implementation across different platforms.
  • Suggested creating a script or infrastructure to test on a running node for validation and comparison of implementations.

Lodestar Light Client Development

  • Continued work on integrating Lodestar code directly into documentation for demonstration purposes.
  • Encountered issues with node abstractions leaking in the browser, such as process and buffer objects.
  • Identified potential improvements on the browser side and Lodestar side to facilitate browser development.
  • Plans to discuss with Nazar for insights on integrating the light client in browser applications.
  • Noted the existence of a chrome extension by Fireblocks that successfully runs in the browser.

Matt:

  • Three PRs are up for the BLST library, covering memory, fuzzing, and performance tests. Comments on these PRs have been addressed and are ready for further review.
  • Created a fork of the Node repository to release a debug build and attach binaries to releases. Addressed the issue of binaries not being consistently available by hosting them more officially on GitHub. Provided a URL for pulling Node builds, ensuring reliable access to necessary binaries.
  • Noted the GitHub runner's limitation of 8GB, insufficient for linking Node together. Discussed the requirement for access to larger instances to use the solution effectively. Added a recipe to the unofficial builds repo of Node to build and host non-standard builds of Node debug. Preparing a PR for this alternative solution but faced access rights issues.
  • Updated the git checkout action to include a debug flag. Attempting to host debug builds either through the releases from the forked repo or the unofficial builds repo. Plans to integrate the download URL for debug builds into the git checkout action.
  • Requested a final review of the PRs for the BLST library.
  • Awaiting the build and approval for the Node debug unofficial builds, or access to use the build jet for testing the checkout action fork.
  • Continuing work on QUIC while waiting for resolution on the Node debug build issue.

Cayman:

  • Finalizing the libp2p v1.0 PR which had been pending for a while. With recent updates to libp2p dependencies, the PR has reached a stable state.
  • Addressed a bug in multi-stream select causing protocol stream errors, which has now been resolved.
  • Plans to update the gossip sub version to the latest release and finalize the PR.
  • Recognized that the SSZ (Simple Serialize) API PR has not been progressing on his end and needs attention. Considering either advancing the PR himself or handing it off due to the risk of it becoming outdated.

Nico:

  • Finalized the last fixes for the Goerli fork, ensuring stability and functionality.
  • Collaborated with Julian on multiple merges in the Beacon API spec. Noted only one PR left to be merged in the Beacon API spec.
  • Emphasized the need to carefully handle errors to avoid incorrect attestation publishing.
  • Examined updates required for the Chiado Deneb hardfork, particularly for the Gnosis testnet. Identified an issue with Deneb values being set as constants rather than part of the configuration, affecting network support flexibility.
  • Highlighted necessary updates before supporting other networks like Gnosis. Mentioned that for mainnet tests, scheduling the fork epoch should suffice.
  • Plans to rebase the SSZ branch as part of ongoing work. Aims to clean up all server handlers for improved functionality. Anticipates updating the client side and achieving a buildable state within the week.

N.C:

  • Specifically focused on the 6110 PR and the block reward API PR related to it.
  • Conducted testing on the proposal boost reorg. Believes the PR is now ready for review. Expressed uncertainty about the approach and is open to opinions and suggestions.

Tuyen:

  • Completed the implementation of the confirmation rule prerequisite in fork choice. Awaiting the latest release of spec tests for further validation. Noted a small improvement to recalculate the total balance of the justified checkpoint.
  • Addressed the issue of redundant block downloads by root at every slot. Implemented a catch to avoid downloading blocks already available through gossip but in the process of being processed. The Pull Request for this solution has been merged.
  • Received the first round of review from Cayman on the latest N-Historical States PR. Encountered a hurdle with vitest not supporting explicit resource management.
  • Configured zero historical states in a test branch for n-historical state. Discovered a bug causing an invalid state root error after three days of running. Acknowledged the bug's complexity and the time required to resolve it.
  • Deployed a node for batch publish analysis and noticed unusual metrics, including missed attestations and significant balance delta increases. Intends to revisit the issue after merging the libp2p changes for further analysis.

Gajinder:

  • Actively working on cleanup Pull Requests for Lodestar in preparation for the Deneb release. Continues to prioritize cleanup tasks leading up to the mainnet release.

  • Reviewed new candidate EIPs for the next hard fork, including EIP-7002, EIP-7549, EIP-7251, and PeerDAS. Most EIPs appear easy to implement, with a special focus planned for PeerDAS.

  • Discussed some of the EIPs, with NC possibly working on EIP-7002.

  • Consulted Lion about EIP implications, specifically on pulling the committee index out of attestations.

  • Understanding EIP Implications:

    • Clarified that the committee index removal from the signing root doesn't eliminate different committees.
    • Noted that although it may seem redundant, it still benefits signature aggregation and verification.

December 5, 2023 Planning and Standup Meeting

Planning

Topic: Invalid State Root Issue Analysis

  • Issue Overview: During the weekend, an issue was encountered with the invalid state root in block proposals. This involved two types of blocks: one from the builder and the other from execution.
  • Block Publishing Discrepancy: When the builder block failed to build, the execution block was chosen, signed by the validator, and submitted for publishing. However, there was a mismatch between the state root at publishing and the calculated state root.
  • Initial Assessment: The issue initially seemed related to state transition but was later identified as a change in block hash during serialization between the validator and the beacon node.
  • Investigative Approach:
    • Observed that the block produced at the beacon had a different hash root compared to what appeared at the validator.
    • It was theorized that the body root was incorrect, which led to the state root mismatch.
  • Resolution Steps:
    • Introduced patches to take a hash tree root of the execution body immediately upon receipt, even before sending it for state transition computation.
    • Added caching for the body root hash to ensure consistency across computations.
    • Implemented additional logging for the body root, the block hash calculated by the beacon, and the block hashes signed by the validator.
  • Outcome: These changes provide a robust logging trail for any future occurrences of similar issues, enabling more efficient debugging and resolution.
    • As of standup, this issue has not been seen again.
    • min-bid lowered could be a reason why, but we should reinstate 0.07, give it a few days of observation and release the patch.
    • There have been deployments where the builder block consistently errors, triggering fallbacks. This issue was observed on Holesky, where a produced blinded block version was running instead of produced block V3.
    • If such errors occurred, they would manifest as mismatches in the roots because the full block is reconstructed from the local cache when published through the published blinded block.
    • Theory: The error might only occur when the fallback is triggered. For instance, when a builder block request fails, it falls back to another block type.
    • This issue was not observed in some deployments where only local blocks were used without a builder connected.
    • Devnet 11: The flow being discussed was operational on DevNet 11 for an extended period. Despite frequent builder errors, the specific issue never surfaced there.
    • The plan is to monitor the issue starting today, with a decision to be made by Friday on whether to push out the patch. If no incidents are observed by then, the issue could be considered resolved.

Patch Release Planning:

  • Infrastructure teams will be brought in to monitor the situation. The intention is to release a patch on Friday, incorporating several suggested fixes.
  • Additional suggestions for the patch include addressing the Auth headers encoding issue raised by Jacob. This requires review before inclusion.

Latest Gossipsub Version: The recent version of Gossipsub (version 10) may potentially resolve the memory leak issue.

  • Heap Snapshots Analysis: After examining different heap snapshots, it appears that the leak may originate from the outbound stream within Gossipsub.
  • ES-Lint Fix: A recent PR addressed an ES-Lint issue by catching an error during the closure of outbound streams. This fix is believed to address the memory leak by ensuring proper removal of peer-related streams from the peer map.
  • Draft PR for Confirmation: A draft PR is in place to validate this fix. It requires time to conclusively prove its effectiveness.
  • Awaiting Confirmation: If the fix proves successful, it could be included in a hotfix version.

Updates

Tuyen:

  • PR Merges: Successfully merged PRs to improve process slashing and implement a shuffling cache, which is part of the n-historical state project.

  • Historical State Work: Progressing on a method to persist only one checkpoint state per epoch, work is ongoing.

  • Invalid State Root Issue Investigation:

    • Deployed a MEV boost hardcoded version to always return no min-bid received, mimicking their setup.
    • Despite the deployment on mainnet, the invalid state root issue did not occur, continuing the investigation.
  • Memory Leak Investigation:

    • Identified that streamsOutbound was holding a significant amount of memory.
    • Observed that streams outbound might not be removed properly from peers.
    • Noted that the latest gossipsub version addressed this issue.
  • Additional Work: Created a PR to track the input generation step.

Gajinder:

  • Invalid Proposals Issue: Focused on resolving the issue with invalid proposals on the engine.

  • DevNet Twelve Performance:

    • Successfully participated in DevNet Twelve, featuring the new format of Blobsidecars with inclusion proofs instead of signed Blobsidecars.
    • The builder flow is functioning well in this new setup.
  • Hive Test Results:

    • Mario from Hive conducted tests against the build, with most sanity cases passing.
    • Identified few edge cases related to the equivocation of Blobsidecars.
    • Lion's suggestion on the PR about not accepting new sidecars could potentially resolve some issues.
    • Another noted issue is the late gossiping of Blobsidecars in the block, currently only waiting for 4 seconds as per the PR.

NC:

  • Consensus Block Value for Produce Block V3:

    • Completed implementation and received a review from Gajinder.
    • Identified some ambiguities in the API spec, which requires clarification from the API discord channel.
  • Rewards API Development:

    • Progress made on the Rewards API, but currently on hold due to the need for API spec clarification.
  • Reorganizing Late Blocks:

    • Spent time understanding the spec for reorganizing late blocks.
    • Initially planned to start working on it later this week, but this may depend on the progress with the Rewards API and spec clarifications.

Cayman:

  • Js-libp2p 1.0 Integration:

    • Actively working on integrating Js-libp2p 1.0.
    • Inadvertently addressed a memory leak by adding more Linter rules to gossipsub.
    • Collaborating with Alex to debug some persisting issues, likely related to noise.
  • SSZ API Progress:

    • Making significant progress with Nico on the SSZ API.
    • Most route definitions have been transitioned; only the key manager routes remain.
    • Approximately 50% completion on the project.
    • Next steps involve tweaking tests and updating client call sites.
  • Exploring ChatGPT Plugins:

    • Experimented with Chat GPT and its plugins, particularly the 'ask the code' plugin.
    • Successfully used the plugin to query information about the Beacon Chain class in Lodestar.
    • Suggests the potential utility of the plugin for querying Consensus specs, Lighthouse, and other projects.

Lion:

  • EIP-7549 Proposal:

    • Proposed EIP-7549 to move the index from the Attestation data to the Attestation body.
    • The change significantly reduces the padding cost (64 times cheaper) and network complexity.
    • This simplifies processes for beacon nodes by reducing the number of signatures they need to check.
  • Max Effective Balance (MaxEB) Design Suggestion:

    • Francesco, a consensus researcher at EF, proposed a simpler design for MaxEB.
    • Instead of consolidating stakes, introduced a new concept called a "cluster."
    • Entities can register as a cluster and aggregate their signatures, sending only one aggregate signature to the network.
    • This approach simplifies network complexity without consolidating state size.
    • The main goal of MaxEB is to allow single slot finality by reducing computational and networking costs.
  • Clustering Concept Explained:

    • Clustering is about reducing the number of attestations sent to the network.
    • It involves entities forming clusters and agreeing to aggregate their signatures.
    • This does not impact the state size, focusing more on operational efficiency.
  • Compounding and Clustering Impacts:

    • Discussed the impact of compounding rewards and the difference it makes.
    • The compounding effect may not be as significant, especially when considering the frequency of compounding.
    • Clustering at the network level is expected to have minimal negative impacts.
    • Benefits of clustering include reduced operational costs and the facilitation of single slot finality.
  • Operational Changes with Clustering:

    • Large node operators could potentially scale down the number of beacon nodes they're running.
    • Clustering by a factor of 100 would significantly reduce the network burden.

Matt:

  • PR for Race Condition Fix:

    • Posted a PR to fix a race condition that appeared in Electron during package publishing.
    • It was a simple issue of an asynchronous function hidden in synchronous constructors.
  • Native Heap Analysis:

    • Explored various tools for native heap analysis, including Valgrind, Massive, and GDB.
    • Settled on Heaptrack, which works well for both C++ and Rust.
    • Documented the process of using Heaptrack for analyzing memory usage, which includes creating flame graphs.
  • Analysis on Lodestar Unstable:

    • Ran Heaptrack on Lodestar Unstable to identify components contributing to RSS.
    • Found that BLST points, node internals, and LevelDB are major contributors.
    • The analysis provided insights but more work is needed to fully understand memory usage.
  • Debugging BLST Rebuild:

    • Detected a segmentation fault in the BLST rebuild and identified its cause.
    • Working on a fix for the issue, which stemmed from the structure of a function/class.
  • Heap Dumps Analysis:

    • Analyzed heap dumps provided by Nazar, focusing on tracking function pointers and memory addresses.
    • Using LDB for analysis, which has a steep learning curve.
    • Writing documentation on how to analyze core dumps.

Nazar:

  • Benchmark Performance Analysis:

    • Investigated the performance of phase zero block attributes.
    • Added numerous metrics to block production processes for detailed step-by-step measurement.
    • Concluded that phase zero attributes are not the primary contributors to block production time.
    • A PR related to this analysis has been approved and will be merged soon.
  • Migration of Proverbs Tests to Vitest:

    • Preparing to migrate Proverbs tests to Vitest, with an emphasis on browser-based testing.
    • This migration will facilitate easier transfer of other tests in the future.
    • An older PR in this regard is being finalized for regression fixes and updates.
  • Upcoming Work:

    • Planning to open additional PRs next week, focusing on unit tests for other packages.
    • Will address other high-priority issues as they arise.
  • Segmentation Fault (Segfault) Issues:

    • Shared insights from the Vitest community and Matthew regarding segfaults in projects using native code.
    • The problem may relate to thread implementation details in Node.js.
    • Suggested moving from threads to child process forks for better stability with native code.
    • Plans to share a discussion thread for further examination and potential solutions.

Nico:

  • Debugging Depth Node Issues:

    • Resolved an issue with Key Manager API not returning proper responses, which disrupted synchronization between validator keys and web3 signer keys.
    • Fixed a minor issue where duties were not being deleted upon key deletion.
  • Synchronization of Local and Remote Signer Keys:

    • Explored native implementation for syncing local public keys with remote signer keys.
    • Noted that third-party solutions typically use scripts or services for this purpose, questioning the necessity of a native implementation.
    • Related discussion in the Loadstar help channel about maintaining sync.
  • Improvements and Reviews:

    • Made minor fixes and log improvements.
    • Reviewed Luca's PR for voluntary exit, noting a useful refactor for the web3 signer that can be leveraged in tests.
    • Continued work on the Server implementation refactoring for SSZ, with some aspects still pending and cleanup required.

November 28, 2023 Planning and Standup Meeting

Planning

The planning meeting focused on addressing performance concerns and setting goals for upcoming Objectives and Key Results (OKRs). Key points discussed included:

  • Performance and Future Requirements: Discussion on how Ethereum roadmap and evolving specifications might affect Lodestar's performance. Current performance is within acceptable limits, but future updates might increase requirements.
  • Metrics and Iterative Improvement: Emphasized the importance of having good metrics, especially performance-oriented metrics, for iterative improvement.
  • Deneb Hard Fork: Discussed the need for metrics related to the Deneb hard fork, acknowledging a lack of insight into how this would affect the node's runtime.
  • Garbage Collection Concerns: Discussion on the importance of monitoring garbage collection metrics, particularly spikes in port condition and their impact.
  • Debugging and Tooling: Highlighted the need for effective debugging tools for understanding memory usage spikes and main thread performance.
  • Actionable Goals: Consensus on setting actionable goals for performance improvement, focusing on developing tooling for diagnosis and incorporating these goals into the OKRs.

Updates

Matt:

  • Documentation Update: Completed a documentation update and planned to run a verification script against the Light client example.
  • Debugging BLAST Memory Usage: Investigating memory usage of individual objects in BLAST, which are heavier than in the existing library.
  • Heap Debug Research: Researching GDB for heap debugging to understand heap dump contents, addressing memory usage concerns.
  • Level DB PR and Race Condition: Addressing a race condition discovered during publishing of the Level DB PR, which was not detected in unit testing but appeared in Electron testing.
  • Goals for the Week: Aims to respond to PR comments, continue heap debugging research, resolve the Level DB race condition, and potentially collaborate with Cayman on reviewing Quinn project aspects.

Nico:

  • Packaging Solutions Research: Investigated different solutions to compile binaries on GitHub without ARM runner support. However, no successful solution has been found yet.
  • Tasha Package Evaluation: Examined the Tasha package, previously mentioned by Cayman, as a potential solution. Despite its recent deprecation, it's considered a better option than its replacement. Nico plans to review the Tasha code in-depth and potentially discuss its long-term viability with the package author.
  • Ethereum on ARM Solution: Explored how Ethereum on ARM currently handles Loadstar releases. They publish new packages for each release, including DBN packages, using Kasha on an ARM server. Nico has contributed updates to their repository to streamline this process. The goal is to simplify their workload by enabling Loadstar to publish binaries directly, eliminating the need for Ethereum on ARM to run the script for Loadstar in future releases.
  • General Activities: In addition to this research, Nico also addressed various issues and conducted code reviews.
  • Ethereum on ARM's Current Process: They currently run a script in their repository to build and publish Loadstar binaries using the Kasha package. This process is executed on an ARM server. Nico has updated this process to ensure efficiency and correctness. The aim is for Loadstar to take over binary publication to reduce the workload for the Ethereum on ARM team.

Lion:

  • Published EIP-6914 Document: Successfully published the EIP 6914 document, with input and comments from the team.
  • Blob Sharing Protocol: Released the Blob sharing protocol spec, developed in collaboration with Dankrad. The protocol promises a trustless solution, and Lion plans to implement it when time permits.
  • Max Effective Balance Work: Plans to resume work on the max effective balance feature, especially in light of the upcoming Electro hard fork.
  • Slashing Risks Analysis: Prepared a document analyzing slashing risks, which he discussed during the team retreat. Lion is now taking a mathematical approach to this analysis and is keen to receive feedback from the team.
  • Consolidation Challenges: Addressing challenges related to consolidation, which has received mixed responses. Lion anticipates this will be a complex and time-consuming task.
  • Worked with Jimmy at Lighthouse to implement a light client server. We are 1 of 9 PRs merged.

Tuyen:

  • Validator Generation and Flare CLI: Attempted to generate a larger validator set for testing, but faced challenges due to the time required. Also used the Flare CLI to simulate validator slashing and discovered a bug related to block production when handling a large number of attestations and slashings. A fix has been identified but needs further review.
  • Memory Leak in Network Thread: Submitted a pull request to address a memory leak in the network thread. The tests are close to completion and the PR is pending review.
  • Shuffling PR Testing: Tested the shuffling pull request and found it to perform comparably to the current unstable version. This testing is part of the ongoing historical stack work and is crucial for future development.

NC:

  • Benchmarking for 6110 Pubkey Cache: Published the initial benchmark results for the 6110 pubkey cache improvements. Adjustments were made based on the findings, and the benchmark was shared in the PR comments for further opinions and feedback.

  • Implementation for DevNet 12: Started working on implementing the addition of the consensus block value to the produce block v3 endpoint. This feature is critical for the upcoming DevNet 12, which is scheduled to take place in approximately three days. The implementation process revealed a significant overlap with the rewards API, suggesting that once the current task is complete, the rewards API implementation would be more straightforward.

Nazar:

  1. Core Dump in CI Server:

    • Successfully merged a PR that enables core dump storage on the CI server.
    • If there are any segmentation fault errors on the CI server, core dump files will now be attached as PR artifacts.
    • Team members are encouraged to download these artifacts and share them on Discord for further investigation.
    • This approach was adopted due to the difficulty in manually detecting core dumps during trials.
  2. Investigation of Slow Block Production:

    • Ongoing investigation into the causes of slow block production, particularly when it exceeds 1 second.
    • Preliminary findings suggest the delay isn't due to the speed of the builders or execution APIs. Instead, it may be related to the sequence of execution, where the process waits for one execution before moving to the builder.
    • The issue doesn't occur uniformly across all servers but is limited to a few.
    • Creating initial benchmarks for block production to serve as reference points for future performance assessments.
    • Plans to open a draft PR and test on servers, although the exact cause of the slowdown is still under investigation.

Cayman:

  1. SSC API Refactoring:

    • Currently at the halfway point of the refactoring process.
    • Completed route definition refactoring for all endpoints except validator and key manager ones.
    • Plans to implement changes across all call sites once these remaining endpoints are refactored.
    • Future work includes enhancing tests and making request/response/header validation more robust.
  2. Collaboration with Libp2p:

    • Working with Alex from Libp2p on improvements for the upcoming 1.0 release of Js-libp2p.
    • Key performance enhancements include:
      • Reduction of buffer copies in the stack.
      • Elimination of one round trip in session establishment.
      • Halving the time to first byte for a peer, as observed in tests.
  3. Quinn JS Binding Scaffold Project:

    • Created a scaffold project for Quinn JS binding.
    • Intends to share this with Matt for further development.

August 15, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/PPgaTfWK

Planning

Reprioritizing Yamux: Yamux was initially deprioritized as Mplex was removed. However, there is an initiative by Pawan to reintegrate it. The team discussed the potential challenges and the need to reintegrate Yamux before the upcoming release. Reason for Change: It was mentioned that the older system (Mplex) was deprecated by lipp2p specs, making way for Yamux, which is touted to be better and faster. However, our specific implementation had issues, notably a potential memory leak.

Memory Leak in discv5: There was mention of a discv5 memory leak. The connection or difference between this memory leak and the Yamux issue was clarified. Status: The memory leak issue was noted as something they wanted to work on previously, but other priorities kept pushing it aside.

Libp2p upgrade to 0.46: There was an upgrade to libp2p, and the team discussed its relevance to the Yamux issue. It was stated that the upgrade unblocked Yamux's integration. Challenges: The team discussed the challenges of deprecating Mplex and whether the integration of Yamux should be expedited for v1.11. We will try to get it in depending on the results seen.

Holesky Testnet: The speaker mentioned their focus on the Holesky testnet, emphasizing that the keys for it have been generated and uploaded for its Genesis. Infrastructure Challenges: It was noted that they plan to use 5,000 keys per server, a task never done before. The infrastructure team is prepared for potential breaks and might reduce the number of keys if issues arise.

Beacon Data Pruning: A significant focus was given to dealing with finalized archived states in the beacon data. An issue was created just before the meeting to discuss potential pruning features, aiming to limit the growth of beacon data.

Boot Node CLI Command: Update: There's a draft PR for a Lodestar boot node, which would simply run a discv5 server without running a full beacon node. The idea is to contribute to the boot node ecosystem, aiming for better distribution among different jurisdictions and providers.

Potential Features for v1.11: Fork Choice Improvements: Tuyen was working on performance improvements in this area. Protons Upgrade: A mention of the Protons upgrade in Gossipsub which had been pending for a long time.

BLS (Boneh-Lynn-Shacham) Implementation: A significant part of the meeting was devoted to discussing the integration of BLS, a cryptographic method, into their system. It's a large change, requiring meticulous review. Status: While promising metrics were observed, there's some apprehension about rolling it out without thorough testing and review.

Node Performance & Improvements: Target Peers: Tuyen had plans to work on increasing target peers post the testing of the Protons feature. GC Time Reduction: Adjusting the 'new space' had resulted in reduced garbage collection time, impacting block times positively.

Issue of Aggregated Attestation Errors: There had been observed errors in Lodestar beacon node's aggregated attestation. Status: Nico provided an update, and Nazar mentioned working on a PR that would enable different validator and beacon node combinations in simulation tests to detect if the aggregate isn't produced.

Updates

Tuyen:

  • Bigboy Testnet Issue: Problem: Observed a long update head call taking up to eight seconds, resulting in a notably low number of peers. Cause: In the devnet test, many unfinalized proto nodes were detected, causing the updateHead call to grow exponentially. This was primarily due to excessive checks to verify if nodes shared the same finalized checkpoint. Solution: The main fix applied was within the 'node verify for head' which significantly reduced the time consumed. With the introduction of this fix, Tuyen believes the next version of Lodestar, when used on that testnet, won't reproduce the same problem.
  • Performance Improvement: Tracked votes by index to enhance the compute delta function. Results: While performance tests showed a 2x or 3x improvement, the mainnet node testing exhibited an even more remarkable 8x speed-up. The typical updateHead call duration reduced from 240 milliseconds to about 30-40 milliseconds. Tuyen requests reviews on this matter.
  • Protons Migration: Protobuf in gossipsub has been implemented by protobufs-js. There has been a plan to transition to protons, which was postponed. Current Status: Tuyen is now looking to implement this migration into the system and is currently testing it within a group.
  • Gossipsub Metrics Issue: A small PR in gossipsub intended to de-duplicate metrics. An earlier PR aimed to unbundle metrics, but it resulted in two validation phases: one in gossipsub and another at the application level. The naming of these metrics (like the count of invalid or valid messages) became confusing due to duplication. Solution: In Tuyen's PR, he proposes renaming metrics at the system level (perhaps as 'pre-validation result') to distinguish them from application-level results. This de-duplication is essential since running Lodestar with the latest gossipsub becomes unfeasible without this PR. He welcomes alternative suggestions and feedback on his proposed changes.
  • Index Gossip Queue: The queue's implementation has been merged. Tuyen is currently working on a PR to utilize this, marking the final PR for this work segment.

Nico:

  • Nico investigated a problem related to Lighthouse and has documented his findings.
  • Nico updated the bootnodes. This involved pulling the latest changes and updating the hard-coded values, especially the ENRs (Ethereum Node Records) present in their code.
  • Nico addressed an issue in which the system didn't support the authorization header, or more precisely, basic authentication was not supported. This deviation from the specification was due to a bug in their past implementation. An observation was made: had they upgraded to NodeFetch version 3, they would've faced the same problem. Nico's solution was to build the header in advance since the fetch library doesn't do it by itself.
  • Nico has been exploring the transition of moving the network from being a thread to being a process. He managed to get it operational, but a new issue arose: after some time, the main thread stops receiving any message events. Nico has identified that there is a high volume of events being dispatched from the worker to the main thread, which could be causing this issue. He's unsure if it's a bug in the child process's execution or another issue, and he's currently focused on debugging this problem.

Gajinder:

  • Gajinder has been focusing on the integration of "produce block V3." This new version serves as a unified API for both the execution block and the builder block. The main intent behind this development is to shift the race between the block builder and execution over to the beacon.
  • Challenges:
  • Beacon Node URL Treatment: Deciding if each beacon node URL should be treated distinctly or as fallbacks.
  • Handling "Produce Block" Actions: Issues arise especially if some URLs don't respond in time.
  • Variability in Connections: The connections of beacon nodes to builders can influence block optimization.
  • Beacon URL Perspective: Consideration on whether all beacon URLs should be seen as separate block producers and raced with a specific cutoff.
  • Fallback URL Racing Structure:
  • Gajinder was prompted about his opinions on this structure and its design trade-offs.
  • Vouch's Strategy: While Vouch races multiple beacon nodes to get the most optimal proposal, many validators usually link to a single node.
  • Variation in Setup: Different beacon node setups can have diverse builder attachments, affecting the block's value.
  • Two-Second Cutoff Strategy: A method where there's a two-second cutoff and then picking the first to resolve seems optimal. This might be beneficial even if the race is transferred to the beacon node.
  • Publishing Blocks: Broadcasting to all, as opposed to depending on a fallback system, might prove more efficient.
  • Proposed Solution: Gajinder is contemplating introducing a mechanism where:
    • The HTTP client will race all URLs with a specified cutoff and timeout.
    • This would provide a generic interface for any calls they'd want to execute in this mode.

N.C.:

  • ePBS Discussion:
    • N.C. and the team last week concentrated on the design of the inclusion list for ePBS.
    • Various inclusion list designs exist:
      • Forward inclusion list
      • Same slot or same block design
      • Top of block
      • Bottom of block
    • Despite these variations, when it comes to the engine API, most of these designs have similar specifications.
    • N.C. documented these specifications and offered to share the link for others to review. Validator and Builder Spec:
    • The team from Prysm is still contemplating and delving into many intricate details that need to be addressed. Analysis on 6110 - Pubkey Cache in Lodestar:
    • N.C. analyzed the pubkey cache in Lodestar's current codebase.
    • Personal Opinion:
      • N.C. believes that there's no requirement for the unfinalized index to pubkey cache. The rationale is:
        • The beacon API doesn't use the index to pubkey. Instead, it uses pubkey to index.
        • For any use cases that leverage index to pubkey, it always mandates an active validator, rendering the unfinalized cache unnecessary.
    • Feedback:
      • N.C. acknowledged comments from Gajinder, particularly one pointing towards non-finality. This aspect still requires some thought.
      • N.C. is looking forward to receiving more insights from Lion regarding the pubkey cache design.
      • Once a consensus on the design is achieved, N.C. plans to commence with coding.

Cayman:

  • Libp2p Update: Cayman successfully updated libp2p to its latest version during the past week. The update primarily involved a significant amount of package renaming.
  • A notable change in the update is that js-libp2p has transitioned to a monorepo. Consequently, all the relevant content is now housed within the libp2p/jslib2p repository.
  • Exceptions include ChainSafe maintained packages such as gossipsub, noise, and yamux.
  • Impact of Libp2p Update: The update to libp2p has paved the way for Tuyen to proceed with tasks associated with gossipsub and yamux, as well as other upgrades. The team was previously unable to upgrade certain dependencies due to the breaking changes present. Upgrading libp2p was a necessary precursor to address those.
  • Boot Node CLI Command: Cayman introduced a boot node CLI command during the past week. While the command is currently functional, Cayman wishes to conduct further refinements. He is particularly interested in streamlining the initialization processes for both the beacon node CLI and the boot node. Cayman aims to complete the enhancements on this feature in the coming week.
  • Investigation into Yamux: Cayman expressed concerns over the performance of Yamux, which, in his observation, is lagging behind Mplex. The performance differential is roughly in the range of 5% to 10%. In order to enhance Yamux's performance, Cayman has been incorporating several tweaks, paralleling those used in Mplex. However, he has yet to achieve a comparable performance between the two in initial tests.
  • Memory Leak Issue: Cayman identified a potential memory leak during his comparison tests.After conducting a heap snapshot, he encountered challenges in viewing the snapshot. The sole tool Cayman is aware of for visualizing heap snapshots is Chrome DevTools. However, it seems unresponsive, getting stuck in the "building the dominator tree" phase. A suggestion was made to try using Brave DevTool, which had previously been effective when Chrome DevTool was not.

Nazar:

  • Prover with Web3.js 4: Nazar dedicated his efforts last week to refine Prover to make it compatible with the Web3.js 4 version. During the process, he encountered some problems and subsequently opened a PR to address and rectify these issues. With the corrections in place, Prover is now functioning smoothly.
  • Bug in Browser Logger: While working on Prover, Nazar identified a glitch in the browser logger, which rendered it incapable of logging anything within the browser. He successfully rectified this error.
  • Unit Testing for Logger: Nazar observed that there was an absence of unit tests for various logger components, such as the environment logger and browser logger. He identified this gap as a potential reason why the aforementioned bug in the browser logger went unnoticed initially. To rectify this oversight, he has written unit tests explicitly for the logger package.
  • Lightclient Demo PR: Nazar completed the Lightclient demo PR, which is now primed for review. This version uses Prover instead of directly employing the Light client. He extended an invitation for anyone available to review the PR.
  • Decoupling in Simulation Test: Currently, Nazar is engrossed in a PR that seeks to segregate the beacon and validator within the simulation test. This move aims to introduce flexibility, allowing for the interchange of execution beacon and validator in the simulation test, much like how execution and content can currently be mixed. Nazar anticipates wrapping up this PR shortly, as it is nearing its completion.
  • Upcoming Work - Integration with MetaMask: Nazar shared his plans for the forthcoming week, which primarily revolve around integrating with MetaMask. He mentioned that he had previously commenced work on a draft related to this integration. Consequently, he already possesses some groundwork that he can expand upon. His objective is to finalize this draft and produce an initial demo that showcases Prover's functionality within MetaMask.

Matt:

  • BLST PR and Loadstar: Matt completed the blast PR and incorporated it into loadstar. All pending PRs in the blast repo received approval and are only waiting on CI and a few other processes.
  • Metrics in Lodestar: Noted a performance decrease in Lodestar when it ran with four LibUV threads due to an insufficient number of worker threads. Once the number of worker threads was adjusted to match, the performance noticeably improved, though not drastically. Several metrics, including aggregated keys, signature sets, block epoch transition times, and block production times, showed improvement. A few metrics worsened slightly, but overall, the changes seem either neutral or slightly beneficial. CPU usage decreased by 30%, providing more resources for other tasks, even though memory usage increased slightly.
  • Garbage Collection and New Space: Matt experimented with different settings related to garbage collection and found that increasing the new space leads to better performance. He plans to deploy another version to validate his findings and ensure the assumptions are accurate, particularly regarding the performance impact of setting garbage collection limits higher than necessary.
  • Research on Error Zero: Matt conducted research on GitHub to understand the root of the "error zero." He suspects the issue might arise from serialization/deserialization in cache management during the startup phase. Matt posted some of his findings on Discord for reference and will continue investigating this lead.
  • Work on Blinding Blocks: Matt reviewed Blind's suggestions for updating blinding blocks and is working on integrating these ideas into his branch. He plans to start with unit tests for transition functions, then incrementally work on the more complex aspects.
  • Feedback Request: Matt seeks feedback on the metrics of feature two to ensure he's focusing on the right aspects. He expressed a desire for candid feedback to better his understanding and asked for collaborative insights from his team.
  • Infrastructure: In the context of utilizing feature groups for testing, Matt has been using three feature groups, but feels he might be monopolizing them. He plans to release some of these groups now that he's collected sufficient data.

August 8, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/tiTnAVVJ

Planning

Big Boy Testnet Issues: Lion detailed certain issues related to Big Boy testnet in issue 5855 and 5857. The participants discussed these problems, particularly focusing on memory usage and cache-related matters. A significant observation was that memory usage spiked up to 12 GB in certain scenarios. Lion indicated that a new DevNet mimicking the mainnet environment was being set up, which would provide more insight. Cayman brought up the topic of reducing cache beacon state size as a potential solution. This approach would be major, and alternatives were discussed. Cayman emphasized the importance of efficient representation of pubkeys and withdrawal keys.

Memory Leak Issues: A potential memory leak was identified when upgrading to Node.js 18.17, although this was not confirmed. It was noted that there was no memory leak in Node.js 20. A discussion about whether to continue supporting Node 18 arose, with most agreeing to support it until Node 20 becomes Long-Term Support (LTS). Nevertheless, they also acknowledged that Node 20 might be preferable, especially if it simplifies things.

Interoperability with Other Clients: There were concerns about the compatibility of Lodestar with other popular clients like Lighthouse, Prysm, and Nimbus. The fallback logic on the Lodestar Validator Client (VC) seemed to be a significant cause of incompatibility, as other VCs expect the beacon node to handle it. Nico highlighted that using Lodestar with Prysm might be challenging since Prysm employs a distinct API for communication.

Updates on the network worker thread. There's a mention of adjusting memory by bumping the "new space," which impacts the event loop.

Performance Issues: A memory leak was identified, which led to exploring various solutions. There's an experiment to deploy with the "new space update" on version 20 to see if it offers a solution.

Worker Threads vs. Child Process: There's a debate about whether using worker threads or child processes would be more effective. Worker threads are suggested for short-lived, CPU-intensive tasks, while the network task in question is long-lived and I/O-intensive. Using child processes might lead to better OS resource allocation and prevent memory sharing with the main process.

Existing Tools: Mention is made of NodeCluster, a tool which uses child processes. A suggestion to try this is made.

Desire for Understanding: There's a quest for a deeper understanding of why worker threads might be superior or inferior to forks at a basic level.

Memory and IPC: A point is raised about memory overhead with child processes and the possible increase in Inter-Process Communication (IPC) cost. However, it's also mentioned that, on Linux, the performance difference is hardly noticeable.

Event Loop Times: Splitting tasks between threads reduces the event loop time, but doesn't necessarily improve API response times due to latency. This suggests that while individual loops perform better, the overall system doesn't see a significant boost.

Performance Expectations: The introduction of worker threads aimed to improve performance. However, changes in the landscape, such as deterministic long-lived subnets, may have made the anticipated gains less noticeable.

**Testing: There's consensus that more testing is required, especially under conditions with all subnets subscribed. The current test data may not reflect real-world performance.

Child Processes for Performance: One participant highlights their experience using detached child processes in another project to maximize hardware performance. They emphasize the importance of using detached child processes without IPC connection for full-core utilization. They also mention using a third-party serialization library.

Action Points: More testing is needed to gather data about performance benefits. An exploration of the differences between worker threads and fork processes is essential. Sharing of relevant implementation details is awaited. The conversation emphasizes the importance of deep understanding, thorough testing, and making informed decisions for optimal system performance.

Updates

NC:

  • Met with the Prysm team for the ePBS. Progress is slow but tasks are being divided, and weekly meetings are established.
  • Started exploring the 6110 implementation on Lodestar. Recognized a dependency on the pubkey cache.
  • Refactoring needed for pubkey cache integration.
  • Current focus is on the PTC design.

Lion:

  • Worked extensively on Whisk, discussing optimizations and security aspects.
  • Identified a potential optimization to reduce state size increase.
  • Furthered work on the devnet and addressed 'big boy' issues.

Nazar:

  • Developed an EL provider proxy which assigns 100 ETH to any connected account.
  • Discovered that the prover wasn't working correctly due to Web3.js version 4.x's different RPC implementation.
  • A PR is in progress to make the provider compatible with Web3.js version 4.x.
  • Working on documentation and addressing an issue related to hiding simulation tests.

Gajinder:

  • Worked on Verkle and successfully read local genesis after type adjustments.
  • Identified issues while attempting a sync with lighthouse.
  • Finished a PR about fee recipient and conducted mock tests.
  • Assisted EF developers in using Lodestar as a boot node.
  • Plans to address interop issues and continue working on syncing the verkle testnet.

Nico:

  • Investigated worker threads vs. Child process for performance benefits.
  • Looked into boot node maintenance.
  • Addressed an issue about enabling/disabling doppelganger protection.
  • Plans include further research on state cache and reviewing code.

Matt:

  • Addressed a bug in Ansible, updated dependencies.
  • Investigated a memory leak issue.
  • Updated BLST code in Lodestar. Noticed a 40% reduction in CPU usage.
  • Investigated new space and semi-space.
  • Aims to finalize ongoing work, address feedback from Ben, and work on deduplicate payloads.

Tuyen:

  • Addressed an issue where Lodestar had more than the maximum peers due to only counting inbound connections.
  • Raised an issue with Nethermind syncing.
  • Worked on updating protobufs to protons in gossipsub.
  • Investigated a memory leak.
  • Plans to finalize the index Gossip queue and study Lighthouse's method of maintaining zero historical state.

Cayman:

  • Worked with Alex from the libp2p team on the varint library.
  • Discussed strategies for using varint across various libraries and how to consolidate to a single implementation.
  • Achievements include: 10x improvement in decoding speed. 5x improvement in encoding speed.
  • The varint library accounts for only 3% to 5% of total CPU time, but Cayman believes there's potential for further optimization.
  • Cayman and Nazar examined the kind of JavaScript produced through TypeScript.
  • They decided to switch from ES 2019 to ES 2021 output.
  • This change addressed issues like the inefficient output for nullish coalescing.
  • Cayman observed that their current max mesh peer count was set to 9. The spec recommends a count of 12. They had previously reduced it from 12 to 9 due to performance issues.Cayman believes it's worth re-evaluating a return to 12, but emphasized the need for thorough testing before merging.
  • Plan to update to the latest version of LibP2P.
  • This update is crucial to incorporate all modifications and fixes in the Gossipsub library.
  • The latest LibP2P version is also necessary for testing YAMUX, which Cayman intends to resume.

July 25, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/1LkydEhH

Planning

Block production times: There have been issues recently with missed blocks on the mainnet. This is due to delays at the validator client side, particularly in pulling for proposer duties at the start of an epoch. To address this, Tuyen has a proposal (PR 5409) to pull proposer duties earlier. We should also look into why there was a huge delay (14s) in getting an execution block on missing slot 6940832.

Tuyen's PRs: Tuyen presented two PRs during the meeting. The first PR was designed to address the delay on the validator client side by polling for proposal duties one second in advance. The second PR was still under investigation and aimed to reduce the delay in producing the phase 0 beacon block body.

Long epoch transition time: There was a concern about the length of the epoch transition time, with it sometimes taking more than three seconds. This was not deemed a blocker but still a significant issue to look into.

Network thread status: One of the main challenges with the network thread was event loop lag. There were ongoing efforts to address this through metrics and exploring the message queue between the worker and the main thread. The meeting also discussed the addition of new BLS APIs and their consumption, the reduction of IO traffic related to the thread, and other measures to improve network thread stability.

Release planning: The team considered the readiness of the next release, including whether to cut a release immediately or wait to fix ongoing issues. Potential solutions included creating a hotfix for some of the recently merged PRs or cutting a scaled-down 1.10 release. The team also discussed dependency issues with node 20 and cross-fetch, and potential ways to resolve this.

Cross-fetch and Node-fetch dependencies: The meeting closed with a discussion on cross-fetch and node-fetch dependencies. The team agreed to take this offline and figure out a solution, which may include downgrading cross-fetch to resolve issues related to connection close headers.

Code coverage tests: The team discussed a pending code coverage test (PR 5225) and decided to merge it, even though it was not in immediate use. The test was self-contained and could be deleted if desired.

Updates

Matt:

  • NodeFetch issue: There's an issue with the NodeFetch upgrade to version 20. The problem arises due to a conflict between an existing bug in Node.js and the bug in NodeFetch, which pertains to a "close connection" header addition that conflicted with the "keep-alive" option. This issue originally arose with Node 8, got resolved by Node 12, but has resurfaced. It's a structural problem related to how the socket, agent, and readable stream interact within Node. Matt contributed information to the ticket addressing this issue and contacted the developer who was supposed to submit a PR for it. He offered help and suggested possible fixes, but the developer said he had already prepared a PR, it just hadn't been submitted yet. Matt indicated the issue is complicated and will require time to resolve.
  • NodeFetch Header Issue: This problem was caused by a header addition in NodeFetch, imported via CrossFetch. However, a PR has been merged that removes this header as default, reverting back to the Node agent's behavior, which should prevent sockets from auto-closing. While this has been tested on the NodeFetch side, Matt hasn't personally tested it yet.
  • PR for DU Command: Matt submitted a PR (Pull Request) for the du (disk usage) command, which had failed in a unit test after a computer restart.
  • Network Worker Message Latency: Matt submitted another PR to capture metrics on network worker message latency. He intends to discuss these metrics with Ben.
  • Run Micro Task Function: Following a question from Tuyen, Matt plans to examine how to break up the run micro task function to better schedule it and improve network performance.
  • Set Timeout vs. Set Immediate: In response to a question from Nico, Matt will look into the strategies of using scheduling methods, such as set timeout and set immediate.
  • BLST work: Matt is close to finishing his work on the BLST project, which has shown promise in stabilizing the network by freeing up the main thread to process other tasks. He specifically mentioned reducing the need to serialize and deserialize keys for state transition validations, which he expects will conserve resources and enhance stability. He intends to focus on this during the week, assuming everyone on the team is agreeable.
  • Follow Up with Ben: Matt plans to follow up with Ben regarding the metrics collected and discuss the progress made on the issues he's working on.

Cayman:

  • P2P Protocol Update: Cayman got a minor PR merged in the P2P that allows for manually dialing the identify protocol. This could potentially improve the identification of peers and client versions in Lodestar, reducing instances of encountering "unknown" peers.
  • Closing Old PRs: Cayman has been working on closing out old PRs in the queue and plans to continue this effort throughout the week. Specifically, he mentioned the PR concerning the discv5 using vanilla events.
  • Multi-fork Types PR: Cayman expressed interest in revisiting a PR regarding multi-fork types, which was previously blocked by a type error. He believes improving the organization of types will be beneficial as more forks are introduced.

Nico:

  • Network Worker Issue: Nico spent time investigating issues raised last week, particularly one involving a hanging process. This was discovered to be unrelated to any IPv6 updates, instead, it was found to be an issue with the network worker.
  • Metrics Configuration: He discovered that metrics were not configured to listen on localhost. He has since resolved this issue with a PR.
  • Simulation Tests: Nico delved into investigating why simulation tests were hanging on a particular PR where the order of shutting down the peer manager was altered. This led to the discovery of numerous "cannot set header" errors. Upon further investigation, he discovered that this was due to a race condition in closing the event stream, which occasionally resulted in the event stream still receiving emitted events even after it was closed or was no longer writable. He has now fixed this issue.
  • Node Health API PR: Nico aims to finalize another open PR regarding the node health API. He intends to implement a good approach suggested by NASA, which is designed to improve the current system.
  • Region Strategy & State Caching: He aims to make progress in reviewing and possibly improving the current region strategy and state caching system, by studying strategies used by other clients. He plans to discuss this further with Line, as he needs to understand some points better before making decisions on potential improvements.

Nazar:

  • Prover Package Issue: Nazar had been facing difficulties using the prover package in the React application due to problems with the package's conditional exports (a mechanism by which building tools like the TypeScript compiler or Webpack can detect the runtime environment and switch import paths accordingly). This issue arose because these conditional exports were not standardized or properly utilized by most libraries.
  • Fix for Conditional Exports: After facing challenges with the above issue, Nazar made changes to make the conditional exports work for webpack. However, a bug surfaced in a package in their repo that was used to lint readme files because it was only detecting one level of conditional exports and not nested ones as webpack could.
  • Named Export Solution: As a solution, Nazar used named export for the browser when using the prover. This method is more streamlined in all building tools.
  • Beacon Node Shutdown Issue: There was a problem with an error message being displayed inaccurately when a beacon node was shut down. The error message indicated that execution had gone offline, while in reality, the execution was still there, but the node was shut down. This issue was due to an abort error being detected as a communication error between the execution layer and the beacon layer. Nazar has opened a PR to address this and is currently writing tests for it.
  • Logical Error in Prover Implementation: Nazar discovered a logical error in the prover's implementation when there weren't enough finalized blocks. If the prover was initialized and there was only one finalized block at that time, this error limited fetching some payloads. He plans to open a PR to address this.
  • Upcoming PRs: Nazar mentioned that he is preparing three PRs which he expects to release either today or tomorrow, including the one addressing the logical error he found in the prover implementation, and the one addressing the beacon node shutdown issue. The third PR is expected to be for the React application, which is almost done and was held up due to the logical error found earlier.

Tuyen:

  • New BLS API: Tuyen completed the new BLS API and will now start working on the index. A PR is expected by tomorrow.
  • Proposal Duties and Subnet Subscriptions: Tuyen submitted two PRs. The first one is to handle proposal duties before the next epoch, and the second one is to avoid subscribing to too many subnets. Tuyen noted that when they joined a sync committee, there were around 50 long-lived subnets on average, leading to a considerable increase in bandwidth usage due to 120K message IDs received in the IHAVE gossipsub. This led to significant IO lag. Tuyen has proposed a solution to restrict the subscription to six subnet peers to manage this.
  • Subscriptions to Short-lived Subnets: The next task Tuyen plans to work on is to avoid subscribing to short-lived subnets too early. This early subscription leads to an increase in bandwidth usage. Instead, Tuyen is looking to subscribe just some slots in advance of an hour later duty in the next epoch.
  • ChaCha-Poly Update: Tuyen mentioned that the "noble guys" have a new ChaCha-Poly update, which will now support the destination as an optional parameter. Tuyen plans to run a performance test to see if this is better than their current assembly script, and if so, they may switch to it.

Lion:

  • Processing Attestations: After a conversation with Terence regarding how long it takes Lodestar to process all the attestations in the aggregate moment, Lion admitted that they currently don't do it. Lion spent significant time trying to understand the extent of this problem and created a new dashboard called "Lodestar Good Behaviour" to monitor things that do not directly affect Lodestar but affect others.
  • Network Impact: Lion expressed concern that Lodestar is growing while potentially being a detriment to the network. One of the problems Lion highlighted is that Lodestar tends to drop messages, creating a situation where messages that are propagated through the network don't get through. If Lodestar had a significant share of the network, this could potentially have catastrophic effects. However, at the current rate, the redundancies in the network minimize this impact.
  • Coordination with Tuyen and Strategy Shift: Lion is coordinating with Tuyen to address this issue. They're considering a more radical approach: if they're not processing a distinguish (an important element) in time, they might as well not do it at all. Lion proposed the idea of potentially turning off their aggregator completely, focusing instead on making Lodestar more performant. Lion sees this as a compromise to give them time to address the overload in Lodestar while they develop a more permanent solution such as networking threads.

N.C.:

  • ePBS Project: N.C. has started working on the ePBS (Enshrined Proposer-Builder Separation) project last Friday.
  • Collaboration with Prysm and Lion: Terence has invited N.C. and Lion to join the ePBS discussion on the Prysm Discord, and it seems like future discussions on ePBS will occur there. An initial meeting with Terence, Lion, and a few individuals from Prysm has been set up for the following Wednesday.
  • Draft P2P Spec on ePBS: Terence has posted his first draft on the P2P (peer-to-peer) spec on ePBS, which N.C. still needs to review.
  • Learning Goals for the Coming Week: N.C. aims to familiarize himself with P2P, particularly with libP2P and gossip sub-protocols, to understand better what Terence is doing with the P2P side of ePBS.
  • Project Documentation: Over the next two weeks, N.C. plans to create project documentation for ePBS to formalize the project. The intention is to set objectives, goals, and break the project into phases for better organization and progress tracking.
  • Current Focus: N.C. mentioned that their current focus is on the P2P aspects of ePBS, and more in-depth discussion on other areas is yet to occur.

Gajinder:

  • PRs for forkChoiceUpdate v3: Gajinder worked on creating pull requests for forkChoiceUpdate v3 for DevNet 8.
  • Broadcast Validation PR: He also addressed the concerns raised by Cayman and Lion on a broadcast validation pull request.
  • Syncing Constantine, the Verkle TestNet: Gajinder experienced issues with loading the genesis while trying to sync Constantine, a Verkle TestNet. After spending a significant amount of time debugging, he discovered a discrepancy regarding the payload header. It now has an execution witness header, as opposed to an execution witness. He plans to address this change and attempt to run the network again.
  • Discussions and PRs on Consensus Specs for publishBlockV3: Gajinder also engaged in discussions and raised pull requests about publishBlockV3 on consensus specs. It seems the current process of builder vs execution race will need to be moved to the beacon, as opposed to the validator which is the current practice. The reason is that the current API format assumes this race and selection are happening in the beacon.
  • Parent Beacon Block Header PR: Lastly, he discussed a PR about the parent beacon block header on consensus specs. While the execution layer (EL) team was in favor of the PR, the consensus layer (CL) team was not. This issue will most likely be resolved in a meeting scheduled for the next day.

July 18, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/A6K4RLX2

During the July 18th stand-up, the team introduced a new member, NC, who will be working as a freelance contributor on enshrined PBS projects. NC has been in the Ethereum space for approximately 8-10 months and has previously contributed to Lighthouse and Besu on the execution layer side. He is now working on the ePBS project with the Lodestar team.

The team also discussed scheduling a demo for the prover. Nazar, who is working on the prover, proposed the following Wednesday for the demo, but since not everyone may be available, they decided to schedule it when the whole team can attend. A follow-up was made to ensure everyone got the invitation.

The team pushed a hotfix release v1.9.2 to their CIP fleet, which included several issues related to reducing the race time. Team members had a chance to observe how their CIP nodes have been performing in the last 12 hours, and they considered whether the hotfix was suitable for release. They didn't see any significant performance changes, but they noticed irregularities with the attestation subnet count of their peers. They decided to proceed with the release and continue monitoring the metrics, particularly block production, on Lido nodes after upgrading them to 1.9.2.

They also discussed scheduling a Grafana education session next week to refresh their knowledge and understanding of Grafana. This session could include topics like understanding and reading heat maps and line graphs, the significance of different metrics, and PromQL.

They discussed fixing bugs that Nico filed over the weekend before cutting a release candidate. They also talked about a potential issue with the upgrade of libp2p which resulted in consistently having more than 55 peers. They agreed to continue monitoring this situation to see if it presents an issue.

There is a proposal to extend the block deadline in the slot and compress the attestation and aggregate sections of the slot. Currently, the team drops a majority of attestations upon subscribing to a subnet. The aim is to become a better network participant.

With the subnet refactor merged, it's hoped that the team can reduce the traffic substantially. They now only subscribe to two subnets, a change from previous approach of subscribing one subnet per validator. This change should reduce bandwidth significantly, which can be tested in the upcoming 1.10 release.

For a successful trial of the subnet refactor, the team is considering increasing the peer count to 100. However, they feel there are not enough mainnet nodes currently to support this. The team realized the lack of a feat-3 mainnet node in their infrastructure. They plan to rectify this to ensure better testing and deployment. The team plans to release v1.9.2 and start deploying it to Lido nodes after the call. The issues raised by Nico over the weekend will be prioritized, and once resolved, a potential release candidate will be thrown into beta for data collection.


July 11, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/U1xS8PqV

Planning

Nazar finished creating the Lodestar test utils package and the end-to-end test for the prover. He's now working on a feature to track the execution engine status without dependency on the ETH_ namespace.

There is a request to fill out a Protocol Guild survey for the Protocol Guild members. There's also a proposal to change some significant eligibility requirements, and the team was asked to review it.

There's an ongoing issue regarding the deployment of Lighthouse and Prysm nodes. There's a problem with downscoring due to lack of backfill enabled. The proposed solution is to sync Lighthouse from genesis, but this would significantly affect deployment speed.

There's an issue with memory leak monitoring that needs to be addressed.

Matt provided an update on the network thread investigation. The latency issue is being caused by page faults at the kernel level due to increased RSS from using the worker thread. He is currently researching more on this to find a resolution.

They discussed planning for version 1.10 with a focus on performance upgrades. They proposed to have a beta version for testing by the end of the week and then look at the results by the next standup.

The team agreed to target the end of the week for cutting the release candidate. They also want to merge a couple of things like the subnet stuff Tuyen is working on and close the Peer Manager issue #5746. Cayman's IPV6 was also merged, and they need to upgrade discv5. They'll continue the release planning on Discord.

Protocol Berg, the upcoming conference, has received a large number of high-quality applications, making it challenging for the organizers to select the final lineup. Despite having limited spots, they are considering extending the event to two days to accommodate more speakers and topics. The organizer encourages the team members to attend, and if they are planning to come to Berlin as a team, the organizer can help arrange for a workshop room in a coworking space for them. This would offer the team an opportunity for onsite work or a subset of team meetings.

Updates

Gajinder:

  • Gajinder is working on the integration of the Verkle trie with the Shanghai testnet, aiming to have it ready for the next launch of the testnet. While there were discussions on the gas cost concerning the transition during the second half of the Verkle call, it doesn't impact the CL side significantly.
  • Gajinder has been keeping up with the developments in devnet-7. He created a PR for attestation validations updates in the devnet. After some discussions and input from Lion, Gajinder is planning to implement gossip validations per spec. There's a clarification that the transition to the Deneb validations will be based on the current slot and not the attestation slot.
  • Besides this, Gajinder has been reviewing the BLST node API update and the second PR by Matthewkeil. He plans to run Lion's current Verkle branch on the existing Verkle testnet, Constantinople, in the coming week. This will help decide the next steps.

Nico

  • Investigated a user-reported problem concerning the Lodestar API package being used to extract the state from the Beacon node and calculate the tree root. Nico determined that the issue was not in Lodestar itself but rather stemmed from a problem with Prysm's state API returning an invalid response. In the process, he also discovered a confirmed bug in Lighthouse's new archive implementation, which yields a different result when calculating the hash tree root from the state than the value in the block.
  • Nico revisited an unresolved problem concerning the Beacon node failing to shut down properly. He has a potential fix for this and has submitted a PR, but he's still testing it and would like someone else to review it.
  • He has been trying to diagnose an issue where some users report that their Beacon node takes an extended time to find peers. Nico suspects the problem might be related to range sync, as there are frequent disconnections when performing block range sync network requests.
  • In addition to troubleshooting, Nico has been working to understand how 'regen' works, focusing on the components that trigger and consume it to understand how it all works together and where improvements might be made.

Tuyen

  • He has rebased a PR related to subscribing to two subnets per node. A new flag, --deterministicLongLivedAttnets has been added. By default, it's false. When set to true, a node will subscribe to exactly two subnets based on the node ID, which changes per 256 epochs. This flag reduces traffic because the node subscribes to just two subnets rather than subscribing to random subnets based on the connected validators. He plans to include this in version 1.10.
  • Tuyen has been working on verifying signature sets with the same signing root. He has been discussing this with Matthewkeil, and it's expected that Lion will review the PR soon.
  • He has also been working on multi-address support, specifically on catching the path in the constructor when a string is received. He has received some comments from Nazar and will address them soon.
  • Next, Tuyen plans to work on prioritizing signature sets from the API.

Matt

  • Matt has expressed appreciation for the work done by Tuyen on the multi-signature PR and Lion for updating the dashboards. He suggests that making the dashboards easier to understand and better organized should be a focus for next quarter.
  • Matt's progress this week was not as far as he anticipated on the blinded and non-blinded blocks. The work was bigger than expected, affecting various sections of code, including Regen, backfill, the API, and two repositories. Though these are mostly built out, some testing still needs to be done.
  • Matt's focus was shared with supporting Ben, which took a significant amount of his time. He also spent some time working with Gajinder and celebrated the approval of their first PR. The second piece is actively being reviewed, and Matt hopes it can be incorporated soon as he believes it will bring significant improvements.

Cayman

  • Cayman has been involved in discussions about converting the multi formats library to TypeScript. The current maintainers prefer JavaScript, so a meeting is being arranged to resolve the issue.
  • A user-reported bug introduced with the new libp2p, where objects that don't conform to their types were being emitted, has been fixed. Cayman has a PR open for it, which he probably should have already merged.
  • Cayman has been working on fixing the end-to-end tests for the node 20 update as some of the error types being thrown differ between node 18 and node 20.
  • He has merged the discv5 IPV6 support and is working on integrating it locally. He plans to push this integration to Lodestar for version 1.10. He intends to finalize these two tasks this week.

Lion

  • Lion has opened a PR for testing Whisk and the spec is now executable. He is currently testing the Proof of Concept (POC) that was rewritten in Rust. If this can be run on a testnet successfully, they will then move forward to tackle politics and future steps.
  • The library that Whisk uses was originally written in Rust, which is why the original POC was also in Rust. A Python version for the specs existed but was very slow, so Lion shifted to a faster crypto backend. Interestingly, an unidentified individual has now written the entire library in Go, opening up the possibility for a Go implementation. At some point, they will need to take the Rust version, change the backend to BLST, and establish some bindings. However, this work won't be undertaken until there is tentative inclusion of the feature somewhere.
  • In addition, Lion and his team have continued with the Max EV proposal. They believe they have found a way to handle execution layer partial withdrawals, which was identified as a necessary task during previous discussions. They have moved away from their initial designs, which were deemed unappealing, towards a solution they are happy with.

June 27, 2023 Planning and Standup Meeting

Transcript: https://pastebin.com/rhniTqBQ

Planning

A proposal for a patch release was made, aiming to include PRs 5714 and 5708, which address issues related to logs from duplicate blocks and syncing logs. There was a discussion on whether the patch should include anything else.

It was suggested to include Gnosis, a fix for a bug in metrics (5715), and a fix for the beacon node not shutting down in certain cases (5716). However, there was a concern about the potential risk of this latter fix, but ultimately it was decided to consider it closer to when the PR gets merged.

There was discussion on including Node 20 in their work. It was reported that Node 20 was able to process more attestations and had a more efficient metric till becoming head. However, there were concerns about the garbage collection pause time rate.

An update was given on implementing deterministic long-lived subnets, which would reduce the subnet mesh peers, thus reducing the I/O lag issue. This was highlighted as particularly beneficial for home stakers.

There were notes provided by Tuyen about his work on deterministic long lived attestation nets (5704). Main change is to always connect to exactly 2 subnets per node instead of based on number of validators, this reduced subnet mesh peers a lot, hence the I/O lag issue.

Updates

Gajinder:

  • Addressed a previous issue with DevNet 6 where Lodestar was unable to sync blobs by range, suspecting that it was an issue with Lighthouse. However, similar issues were noticed with other clients too. Upon investigating, he found that the 'count' value wasn't being multiplied by 'blocks per slot', causing a mismatch. This problem has now been resolved with a PR.
  • Even after this issue was resolved, the system was not syncing to the head. Upon further investigation, Gajinder found that the chain wasn't finalizing because only a few nodes were up. Consequently, after syncing about 11 to 15 thousand slots, the system would stall. It was noted that every time a new peer was added, syncing would start from the last finalized epoch, causing repeated attempts to sync from the same point every time a new peer connected.
  • Gajinder proposed a potential solution to address the issue of non-finalizing chains over many slots. The suggestion is to update the sync process so that a new peer can join the chain that's already synced, rather than always starting from the last finalized point.
  • The issues with DevNet 6 have been resolved, and it is set to be relaunched as DevNet 7.
  • The previous DevNet 7 is now DevNet 8 and is scheduled for launch in two weeks. Two PRs for DevNet 8 are already in, with Gajinder working on an additional PR.
  • Gajinder mentioned his involvement in a PEEP and EEP presentation for direct changes. He anticipates the recording of this presentation will be available soon.

Nico

  • Investigating the issue of false positives in the doppelganger protection mechanism. Nico found that the current implementation is similar to that of Lighthouse, where attestations in blocks can occur in the next epoch. This is what triggers false positives. One possible resolution could be to increase the wait epoch time by one more epoch, but this might negatively impact user experience.
  • Nico is exploring the possibility of implementing zero downtime doppelganger protection. This could be done by checking the local attestations produced by the client. If there's an attestation in the previous epoch, doppelganger protection could be skipped for that validator. However, there are some potential downstream issues with making the registration of signers async, which would need to be addressed.
  • Nico pointed out that his approach might even improve security because two validator clients cannot connect to one database. This would prevent two validator clients from starting attestations at the same time, which is a scenario in which the current doppelganger protection would fail.
  • He also spent some time reviewing the latest beacon API spec and created issues based on his findings. Nico investigated why the spec tests are failing.
  • His focus for the coming week will be mainly on the topic of regen.

Nazar

  • Nazar has been working on a PR that aims to finalize end-to-end test cases for the prover package and introduce a new package called 'testutils'. This new package is designed to consolidate code used for testing that was previously scattered across various packages. The team is encouraged to incorporate any useful testing elements into the 'testutils' package.
  • After the end-to-end testing for the prover is finalized, it will be made public and the first release of the prover package will be done.
  • Nazar plans to incorporate this first release of the prover package into the light client demo. This will help to reduce the amount of code in the light client demo, make troubleshooting easier, and demonstrate the practical use of the prover package.
  • He will then work on creating a MetaMask snap as a proof of concept for the prover. This will help initiate discussions with the MetaMask team on whether it is the right approach for production to use MetaMask snaps or whether it should be integrated into MetaMask itself. This is Nazar's primary task for the week.

Matt

  • Matt has finalized the blst package and resolved the associated bug. The new format of the package has been reviewed by Gajinder and has been successfully deployed to Feature 2. Metrics from the deployment indicate that everything is working well.
  • The worker was activated on Feature 2 and this further improved the metrics, which Matt views as a positive outcome.
  • The blst package has also been deployed on mainnet Feature 2, which Tuyen was using. This will allow the team to observe the package's performance in a mainnet environment.
  • Matt has taken up the task of addressing a database duplication issue. This involves digging into the state and SSZ (Simple Serialize), which has turned out to be a slightly tricky problem. However, he suggests that progress is smooth.
  • In the coming week, Matt plans to conduct another round of review with Gajinder on the next part of blst, finalize the associated PR, and potentially pick up the next task. He notes that this might be a task that involves "similar state stuff" and is akin to what Lighthouse is doing.

June 20, 2023 Planning and Standup Meeting

Transcript: https://hackmd.io/@philknows/Bk0tuWedn

Planning

Version 1.9 Observations: Tuyen reported on the observations regarding version 1.9. There were no sync issues, mesh peers are somewhat less stable but similar to version 1.8, more attestations processed with less dropped, gossip attestation process time increased, CPU usage increased, REST API time increased, and missed attestations remained the same. The team discussed if the increase in event loop lag for the beta node is a concern. It was also observed that Goerli testnet is becoming less reliable for gauging performance and metrics.

Decisions on RC3 as a Release Candidate: The team agreed that the observations were acceptable and there was nothing to prevent RC3 from being a potential release. They decided to continue running it for another 36 hours to observe if anything changes drastically. There was also a discussion regarding relying more on the CIP nodes for final-stage testing and potentially splitting the nodes further to simulate different setups (e.g., home stakers).

Worker Thread and Network Thread: Tuyen mentioned that there is an improvement in the worker thread due to the inclusion of more sleep zero and that it might be ready to be enabled in the unstable release. There were also discussions on whether to wait for version 1.10 to include several large upgrades (like libp2p upgrade) or to have a patch release.

Engaging Libuv Maintainer for Network Thread Issues: Approval has been granted to re-engage the maintainer of libuv for help with network thread issues. The budget approved is up to $10,000. Lion was identified as the person who will compile questions for the libuv maintainer and continue communications with him.

Planning for Version 1.10: The team discussed possible inclusions for version 1.10 which include enabling the network worker thread by default, libp2p update, supporting Node 20, and as a nice-to-have, including Yamux.

Additional Testing Setup for Solo Stakers: It was suggested to have a beacon node running with a single validator to simulate a home staker setup, with more modest hardware, in order to have more realistic testing for such users.

Transitioning from Goerli to Holesky Testnet: Due to Goerli becoming less reliable, there was a suggestion to move testing to a new, larger testnet launching in September (Holesky). The team discussed the importance of testing in an environment that has a large number of validators in the active validator set and how many validators are connected to the beacon node.

Updates

Matt

  • Completed a small PR that involved spell checking for all of the documentation and Readmes.
  • Worked on cache checking. He wrote documentation on how to look for cache hits and misses. After verifying, he found that there is only a 3% extra cache miss on the network thread, which he considers not to be a significant difference. Hence, he believes it’s not necessary to delve deeper to find out the level of the misses.
  • Faced a segmentation fault issue. After investigation, he discovered that the issue was due to keys moving during garbage collection when he started to bundle all of the aggregates and attestations. He realized that he had been looking in the wrong place, and it wasn’t about mixing old and new keys.
  • To resolve the segmentation fault, Matt plans to refactor the relevant function and write unit tests in Blast to ensure stability by triggering the garbage collector.

Cayman

  • Identified and patched a bug in Gossip Sub last week.
  • Has been testing the new libp2p library with various patches to achieve stability or equivalent performance. However, he hasn't reached that point yet. He conducted tests on feature one the previous week.
  • Has several Pull Requests (PRs) that are ready to be merged, but he has been holding off on merging them until version 1.9 is stable. He doesn't want to affect performance or introduce any risks, so he is leaving them open for now.
  • His primary focus for the week is working on getting Lodestar Node 20 ready.
  • As a secondary objective, he will continue working on the libp2p library.

Tuyen

  • Took over a PR (Pull Request) from Lion to eliminate the serialization of blocks after they are fetched from request responses on gossip, which resulted in some minor improvements in gossip.
  • Tuyen’s next task involves working on the integration of yamux. He also mentions that if there are any specific things needed in libP2P, he is available to assist, otherwise, he will focus on yamux.
  • The integration of yamux is expected to be straightforward and can be used in place of Mplex. However, there might be some considerations regarding the compatibility of versions.
  • Tuyen recalls that the code changes needed for yamux integration are small, but he notes that the performance is not as good as with Mplex.

Gajinder

  • Reported issues with DevNet 6, which led to it becoming non-functional. The nodes were not syncing correctly, particularly with the Lighthouse client. Lighthouse had problems serving the blobs correctly.
  • Additionally, Geth had issues with block proposals. There was also a problem with how one of the Ethereum (EL) clients included block transactions with blobs, causing the testnet to break. Because of skewed validator allocations, DevNet 6 is deemed non-recoverable, and work will shift to DevNet 7, which will focus on version 4844.
  • Gajinder reviewed the BLST code from a PR that Matt worked on, and gave feedback. He is waiting for Matt to update the PR based on the comments.
  • He cleaned up his own PR called "Free the Blobs" and rebased it with the latest changes. Most of the work is done, but a few critical pieces are missing, which he plans to push during the week.
  • He submitted a PR for fixing the proposal flow for DVT validators, making sure that local execution engine blocks are not produced or used against the blocks received from the relay.
  • For the upcoming week, he plans to work on Deneb-focused PRs, including including beacon block root in the execution payloads so that proofs against beacon state can be done in contracts in the Ethereum Virtual Machine (EVM). He also aims to align the EIP to make voluntary exits non-expirable.
  • He mentioned an important change regarding deposit snapshots for WSS (Weak Subjectivity Sync). Currently, when doing a WSS sync, the execution client has to backfill all logs to provide the deposit tree to the beacon client. With deposit snapshots, this will no longer be necessary. The execution client will receive a deposit snapshot tree from the CL (Consensus Layer) so it won’t have to sync all history since the deposit contract was deployed.
  • Gajinder will also work on Lion’s PR related to metric proposals.

Nico

  • Nico addressed a problem related to the beacon node not shutting down, which was linked to a libp2p issue. Cayman identified that it was due to an update in a sub-library of libp2p. Nico plans to further debug this once Cayman's fix is implemented.
  • He fixed minor API-level issues, including one that was identified during DevNet 6, and added end-to-end tests for them. He mentioned that he found these tests efficient and thinks they can serve as good sanity checks. He’s considering adding more end-to-end tests for other APIs.
  • Nico added a feature that allows users to force a checkpoint sync, which is mostly useful for development purposes. This feature can also help users who have been offline for an extended period and are facing lengthy sync times.
  • He aims to run all other Validator Clients (VCs) against their beacon node to check for compliance and to identify any issues.
  • Nico observed a potential issue with doppelganger protection, where it produced a false positive. He's uncertain if this can be prevented and plans to investigate further. There should be no false positives according to Lion.
  • He intends to start working on the state and region topic either this week or early next week.

Lion


June 13, 2023 Planning & Standup Meeting

Transcript: https://pastebin.com/6FUM4PbH

Planning

There was an observation that the network was stable except for some mesh peers that were dropped. A branch was deployed previously to test batch delete and it appeared stable based on the 7-day chart. It was agreed to monitor the network for any additional issues. The team discussed deploying on some mainnet validators, particularly the CIP validators, to obtain better metrics. The suggestion was agreed upon as it could provide useful data to the network. The consensus was that the risks were low and the validators could be a good resource. The team agreed that the priority was pushing out version 1.9 RC2. Several people, including larger node operators and relayers, are awaiting the release.

Performance Based on Keys: The conversation began with an observation that nodes with a higher number of keys seem to experience issues, while nodes with fewer keys are performing relatively well. For instance, CIP validators with a lower number of keys appear to be more effective compared to Lido nodes, which have around 200 keys per beacon node.

Network Thread Enablement: The team discussed whether the network thread should be enabled by default. The consensus was positive, as enabling the network thread seems to significantly improve performance, especially for nodes connected to more subnets. The network thread allows for the processing of more messages, and the team believes that this is critical for being a good network participant. Despite the benefits, there were concerns about network threads getting overloaded. The reason why the network thread is clogged compared to the main thread is not clear. It was hypothesized that spinning up a second isolate is creating overhead and that context switching at the CPU level might be the issue. The network thread introduces a thread for the first time, which is different from the main process.

Self-regulation of Network Thread: The team pondered whether the network thread could self-regulate itself not to choke. They considered reintroducing a mechanism to drop messages to reduce the load if the thread detects that it is overloaded.

Backpressure and Yamux: Backpressure, which regulates data flow, was mentioned as a key issue. There was a mention of Yamux, a stream multiplexer, which has built-in backpressure. However, it was noted that Yamux was previously blocked due to memory leak issues.

Upgrading LibP2P: The team discussed the necessity of upgrading to the latest version of libP2P (0.45). We should prioritize this after releasing v1.9.0. This upgrade would provide several fixes in the TCP library and improvements in Gossip Sub. It would also pave the way for retesting Yamux. Future versions of libP2P might replace the underlying implementation of the streams with WebWG streams, which promise better performance and built-in backpressure.

Async Iterables and Buffering Strategy: The team talked about async iterables and buffering strategies. They considered whether they could eliminate abort sources with the current design. The conversation also touched on the implementation of streams using async iterables to avoid additional memory copies when dealing with binary data.

Event Loop Lag and Micro Queue Tasks: The team identified that there was event loop lag and micro-queue tasks might be causing the network thread to underperform. The hypothesis is that if the micro task queue is clogged, the data from the sockets could be loading into L2 or L3 cache, causing delays. However, the team wasn't sure how to test this hypothesis.

Performance Testing: Finally, the team expressed the need for libp2p performance tests to be conducted to investigate the hypothesis that their stack might be slow. This would help them to understand if their observations are due to inherent issues with the stack or if there are other factors at play.

Updates

Gajinder

  • With the launch of pre-DevNet 6, they increased the number of blobs (data packets) that can be utilized in the network from one to six per block.
  • However, this change caused some issues which Gajinder is debugging. A fix has been generated, and they are working on synchronizing back to Devnet6 to stabilize the network.
  • One issue emerged when Gajinder sent 500 transactions with 500 blobs to the network. While each block could handle six blobs, the EL clients started facing issues.
  • One particular problem was with Lodestar which, due to a typo, was sending all six blobs together instead of processing them one by one. A PR (pull request) has been generated to resolve this issue.
  • Gajinder mentions that the current network is running in a single data center, so there haven't been any network latency issues, and he doesn't expect any such issues in the future.

Matt

  • He has successfully integrated BLST (BLS signature library) which is now handling attestations, aggregates, and proofs. He has created a draft pull request and aims to deploy it.
  • He plans to work remotely and attempt to deploy the BLST version to a feature node to gather metrics. He hopes that this deployment will alleviate some of the load from the main CPU and possibly help with other issues they are experiencing.
  • Matt has the second piece ready for Gajinder but understands that Gajinder has been occupied with getting the next step of BLST approved.

Cayman

  • Cayman has been working on the libp2p branch and plans to push any fixes or the latest updates to it.
  • Cayman has been working on getting their system ready for Node 20.
  • He has an open PR (pull request) for simplifying the snappy frame decompression by replacing some old libraries with a simpler solution.
  • Additionally, he is updating Snappy, a native library they are using, to the latest version that is compatible with Node 20.
  • Cayman shares an interesting technique he learned called branded types. It's a method for creating unique types (nominal types) in programming, which can help in distinguishing between similar data types. For example, distinguishing between a regular string and a special ID that is also in string format. This can be helpful in avoiding mistakes where a simple string is used where an ID should be used, as it requires explicit typecasting. Cayman has written a comment on this and provides an example in a library for anyone interested in learning more about branded types.

Tuyen

  • Tuyen investigated an issue regarding external memory in version 1.9.0. He has implemented a fix related to batch deletion, which seems to resolve the issue.
  • Tuyen investigated the network thread and identified some minor optimizations in gossipsub.
  • He found that not converting the PID (process identifier) when calling certain functions could potentially save around 4% of CPU time.
  • However, these optimizations are not the root cause of the network thread issue and Tuyen plans to continue the investigation.

Nico

  • Nico made the thread pool used for decrypting key sources reusable, improved error handling, and fixed issues related to terminating the decryption process. Before these fixes, the decryption process could not be terminated without forcefully closing the process.
  • Nico also submitted a Pull Request (PR) to integrate these improvements into the key manager API.
  • Nico observed that, in some instances, the beacon node does not exit cleanly after running for an extended period. The process continues to run, and Nico has not yet been able to identify the cause or the handler that keeps it active. He mentioned that this issue seems random and is hard to test.
  • Nico is considering adding an explicit process exit once the beacon node is closed as a solution, although he prefers to avoid it if possible. He mentioned this issue occurs on Linux and has also been observed in Docker.

Nazar:

  • Nazar worked on incorporating batch requests into the prover. It proved to be challenging as the prover needed to be compatible with both Ethers.js and Web3JS. Ethers.js does not have a public interface for batch requests while Web3JS does.
  • Nazar opened a PR that includes several final features and refactoring for the prover. Once this PR is merged, he plans to close the epic issue and will open a separate issue for the P2P interface for the light client.
  • Nazar enabled an ESLint rule for detecting unnecessary typecasting and found that there was a lot of unnecessary typecasting in the source code. He opened a PR to address this and is waiting for feedback on whether to keep or remove the unnecessary typecastings.
  • Nazar conducted research on integrating the prover with MetaMask through MetaMask snaps. However, he discovered that MetaMask snaps may not be the right framework for this integration. According to MetaMask documentation, snaps are not intended for long-running processes, whereas running a light client would require a long-running process. Nazar is continuing research on this topic to determine an alternative solution or how snaps could be adapted to fit the use case.
  • Nazar will continue researching MetaMask snaps integration and will update the Light Client demo to use the Prover package after v1.9 is released.

June 6, 2023 Planning & Standup Meeting

Planning

The team mainly discussed issues related to the 1.9 update, including the RSS memory leak problem and batch delete anomalies in levelDB, which are currently hindering the deployment of the 1.9 version. Tuyen shared that he found the batch delete was causing the memory spike issue, and despite changing the approach to delete each slot separately, the problem persisted in one or two nodes. Tuyen is considering changing his approach to address this issue.

There was a discussion about the possibility of underlying C and C++ code causing random memory leaks, as the team heavily relies on it. Notably, levelDB, which is handling the database, had leak issues in the past. The team decided that this should be reported. Matt expressed concern that the leak might be due to the failure of using handlescopes, which would prevent the garbage collector from deleting stuff during lengthy processes. He suggested looking further into this.

Tuyen mentioned that they've had the same levelDB dependencies as in 1.8.0, where there wasn't an issue. However, an enhancement by Gajinder on May 23 appears to have sparked the issue. After its reversion, memory still spiked periodically (referred to as "barting"). Tuyen proposed deploying different versions between May 23 and May 31 to identify the exact commit causing the problem.

The team decided that this was a good plan, although it was noted that the issue was mainly seen in one medium node, and there could be some inconsistency. Tuyen thinks if we go with batch delete, it's quite consistent. We'll just actually deploy the commit with no patches on top. we do the exact commit and revert Gajinder’s PR. The issue was recognized as a blocker for deploying 1.9, but as there's no immediate rush, the team has the necessary time to work through it.

Updates

Gajinder

  • He updated the specifications for DevNet 6 to 1.4.0 alpha one and updated KZG to big endian in the 4844 branch.
  • He merged the PR to change blob and coding transactions to RLP and also changed the request response to new methods. Pending tasks include changing network gossip methods and changing block input.
  • He also will be pushing a pre-devnet6 build with EthereumJS this week.

Tuyen

  • Will proceed with deploying commits between May 23-28 on feat1, feat2, and feat3 nodes to diagnose memory barting issue.

Lion

  • Lion created a PR to automate the process of analyzing why a node is missing attestations.
  • He also experimented with perf and profiler with the aim of automating a network thread rendering SPGs.

Nico

  • He fixed two issues noticed on the 1.9 release and started looking into doing decryption in a thread pool for the key manager API.
  • He also wrote a script to analyze attestations.

Nazar:

  • Nazar worked on two PRs containing small features within the Loadstar Prover.
  • He added whitelisting support and batch request support.
  • This week, he plans to focus on adding test coverage to the prover package and move file names to camel case.

Cayman

  • He worked on getting the Libp2p update upgrade unblocked and deployed it to feat2.
  • He also looked into updating to node 20 and worked on updating the native dependencies for this.
  • Notes from Matt indicate node 20 fixes many regressions from node 18. We should upgrade as soon as we can.

Matt

  • He managed to get stack traces and flame graphs working on worker threads.
  • He's working on analyzing them and using them for better telemetry.
  • He also worked on a PR with BLST and will push the next one when ready.

May 30, 2023 Planning & Standup Meeting

Planning

The primary discussion centered around the team's challenges with the 1.9 release, specifically relating to a memory leak issue that's preventing the release. The memory leak issue seems to have been narrowed down by Tuyen, who found that the external memory jumped since May 23rd. A peculiar fact is that this seems related to the work of change archiving strategy to always store last finalized, which is strange as it used to be performed but not as frequently as currently. The PR related to this issue is not big, and the team was invited to take a look to see if anything can be found. The team seemed skeptical about this PR being the cause of the issue as it mostly deals with fetching some keys from the DB and deciding what to delete, and doesn't appear to involve substantial memory consumption. There was some discussion about the memory leak being related to the external memory, not heap memory, and the possibility of the issue being in LevelDB, given that the external memory could be affected by buffers and array buffers. The team's next steps involve further investigating this issue, considering reverting the PR or changing the frequency of the archive state calls to help narrow down the problem.

Besides the memory leak, other issues for the 1.9 release were mentioned, including unstable peers and request response handlers. One of the main concerns was the instability of peers when the network thread is enabled, resulting in numerous ban messages from other peers. There was a suggestion to temporarily disable the worker thread, release a 1.9 RC1 for testing, and continue to work on the issues as part of 1.10.

Updates

Nazar

  • Nazar closed the PR for "estimate gas for the prover", which completed the list of web3 methods planned for the version 0 of the prover.
  • He is currently working on some improvements before the prover package is ready for publication.
  • Two PRs are ready for review and have been shared on Discord.
  • An issue regarding file naming convention was discussed, with the consensus leaning towards snake_case for file and directory names.
  • Nazar will open a PR for the file renames soon, to avoid potential issues with others.
  • A linter rule has been added to ensure the snake_case file naming convention is adhered to in the future. However, directory names cannot be lint-checked automatically, so manual review is required to ensure the naming convention is adhered to.

Tuyen

  • Tuyen spent most of his time investigating version 1.9, but there are not many results yet apart from identifying a memory leak issue.
  • He has added some metrics to the unknown block sync panel.
  • A performance issue was identified with calling fork choice hash block, which was found to be the main consumer for the network processor.
  • He also has a PR for deduplicating notifier log, which has been merged.

Gajinder

  • Gajinder reported that DevNet 5 was resurrected after being in limbo for more than two weeks.
  • The resurrected nodes could sync using checkpoint sync without being rate-limited by Lighthouse nodes.
  • DevNet 6 spec has a few open PRs, and most of the spec for 4844 is finalized.
  • He completed a PR in Loadstar to desensitize the blob transactions, with only a small part pending to add example RLP transactions.
  • He also separated the request response from the "Free the Blobs/Decouple" master PR.
  • He created a Blobfish banner for Deneb at the request of Barnabas from EF DevOps.

Nico

  • Nico addressed minor issues from version 1.9 and opened some PRs to fix them. He mentioned problems with event listener warnings, especially after upgrading the worker.
  • Nico also reported compatibility issues with Lighthouse VC, including a time discrepancy error with the latest release.
  • He also noted interop issues with Teku.
  • Team managed to get the overall cluster running, successfully making some attestations, but acknowledged that the cluster broke again over the weekend.
  • His goal for the upcoming period is to continue fixing issues and close existing ones.

Cayman

  • Spent the last week looking at metrics from 1.9, particularly the libp2p worker. His analysis suggests that increased workloads might be causing timing issues due to increased data processing.
  • While working with the libP2P team, they found bugs in the Rust YAMUX implementation, which he suggested trying again in Lodestar after version 1.9 is released.

May 23, 2023 Planning & Standup Meeting

Planning

An upcoming protocol program presentation was announced which will cover project updates from the last quarter and future plans.

The team discussed the v1.9 planning with a focus on outstanding PRs. The libp2p 0.45 upgrade will be deferred to a future release due to potential complexities, but the network thread merge has been completed. Several other PRs were reviewed and discussed, including "Use Proper State to Verify Attestations," "Generating and Using Flame Graphs," "Improved Error Handling in Attestation Service," and "Change Archiving Strategy to Always Store Last Finalized." Most of these were close to being merged or have already been merged. The team is looking to commit to a v1.9 release within the next 24 hours so we can include cleanly exit process on graceful shutdown (5330).

The team also discussed the need to ensure the beacon node is shutting down cleanly. There was a need for the threads package to be updated and published, which Cayman can likely do when he returns.

One of the PRs (5521) aims to fix the browser test, and the finalized proposal log, recently added, is under review for its continued inclusion.

The team aims to release v1.9 by the end of the week and subsequently update their production nodes. It is hoped this will address some issues with lower effectiveness and missed attestations.

Updates

BLST integration in lodestar: He has successfully integrated BLST into lodestar, which has resulted in good metrics. However, Matt admits he made a mistake with a promise return in one of the tests, which Gajinder caught. Matt thanked Gajinder for his efforts in reviewing the pull requests and for his valuable feedback. Matt has updated his work based on Gajinder's comments. Over the weekend, Matt worked on refactoring the code and making some additional changes. He has started to rework the code to demonstrate different parts and iterations. Matt has made progress with the memory model and class hierarchy, which he hopes to finish soon. Matt also worked on flame graph testing. He has simplified the code significantly and tested to ensure everything looks good. Matt has put up another metrics pull request for block errors. After integrating BLST, Matt noticed improvements in the system's performance, with no negative gossip scores or bad behavior reports. However, he discovered that he implemented it incorrectly, which could have potentially skewed the metrics. He plans to refactor this after Gajinder's next PR review. During a performance test, Matt found a mistake where the test was passing by the promises, resulting in falsely high gains. Once the error was corrected, the performance was on par with previous measurements. Matt will redesign performance tests to better account for multithreading and will attempt to make it work on CI. If not, he will test it locally on his machine.

Nazar has been working on several pull requests (PRs) for the prover, with the latest one implementing the eth_estimateGas. He is currently fixing some failing browser tests related to this PR. With the conclusion of the eth_estimateGas, the initial plan for the version 0 of the prover would be finished. The future work on this will involve refining and fixing any remaining issues with the package. No new features will be added at this stage. With the completion of Prover's v0, Nazar plans to communicate with MetaMask to test the prover. He suggested setting up an alpha or testing ground with MetaMask for this purpose. In addition to his work on the prover, Nazar is also working on a PR for continuous integration (CI) improvements. Nazar mentioned that due to public holidays on Friday and Monday in Germany, and his plan to travel during this time, it will be a short work week for him. He aims to finish up all the open PRs before Friday and do some fine-tuning of the package to get it ready for discussion with the MetaMask team. He also mentioned a recent change where a new logger package was created. However, a lot of the code is still referencing the logger type from the utils package. Nazar suggested updating the references to point to the new logger package whenever developers come across them during their work.

Nico has mainly been focusing on ensuring a clean shutdown for the beacon node. He is currently dealing with a few remaining errors that get thrown on shutdown, some of which may be due to the database closing prematurely. However, he expects the PR to be ready once the threads package is released. He has made a few minor updates, including improving some locks and fixing some ESLint issues, although he mentioned that the latter is not particularly important to merge right now. Nico plans to consider what could be included in version 1.9. One potential feature is decrypting in a thread pool when keys are imported via the API. However, he is unsure how easy this would be to implement, given the current implementation, and will need to investigate it further.

Gajinder has been working on and contributing to these PRs, incorporating feedback and adjustments. The result is a tentative DevNet 6 spec. However, many PRs are still open and, once closed, Gajinder plans to implement some of the spec items for DevNet 6. He also created two PRs. The first integrates the new database for the Blob sidecars, and the second enables a spec test by adding the remaining types. His plan is to extract network PRs once the database PRs are out. He has integrated the part where blobs were signed, and the block contents are now referred to in the Beacon API spec. Gajinder studied the Node add-on API and Matt's work, and came to a conclusion that some aspects can potentially be simplified. He emphasized that the memory management is critical and should not cause the main Lodestar process to crash. Therefore, he suggested the need for incremental development to prevent any memory issues. He wrote some specs for the custody game, which was mentioned in the context of discussions around endianness of KZG libraries. Gajinder will work on the DevNet 6 preparation, dig deeper into verkle, and aim to extract network work from the "Free the Blobs" initiative, which primarily involves request response and gossip work.


May 16, 2023 Planning & Standup Meeting

  • Lido's v2 Upgrade Deployment: Lido deployed their version 2 upgrade successfully, and the Lodestar team was prepared for it. The team will go up to 8,000 keys with Lido this week.
  • Hiring: The team is in the process of hiring an additional person to work alongside Faith. Several candidates have submitted assignments for the Technical Project Manager (TPM) infrastructure position.
  • Promotion of Lodestar: The team is working on promoting Lodestar more effectively. They plan to set up a developer blog with the help of Mark Hans and use AI like ChatGPT to help publicize the work done at Lodestar, aiming to attract more developers.
  • Writing and Storytelling: The team was encouraged to take notes on their problem-solving processes and use ChatGPT to write them up. The goal is to generate content that illuminates the unique challenges they tackle and the solutions they come up with. This could also serve as a recruiting tool.
  • Content Management System: A content management system is expected to be up within the next two weeks, with assistance from Cindy. This is part of the effort to attract more people to their website and work.
  • Obol Cluster: The team is still working on the Obol cluster and is awaiting a signed message from Cayman for the Distributed Key Generation (DKG).
  • Lodestar's Integration into Docker Scripts: The team appreciated Nico's work on integrating Lodestar into Docker scripts.
  • Networking Thread Updates: An update was given on the networking thread. The branch is up to date and will be merged after fixing the failing end-to-end tests. After the merge, any reorganization work will be carried out in a separate refactoring PR.
  • Release Planning: The team is aiming for a new release or at least a Release Candidate (RC) next week. While not all milestones for version 1.9 will be reached, they want to determine the essential features to include. The network thread updates were deemed a critical component of the next release.

Updates

  • Gajinder's Work: Gajinder worked on the proposal stats PR, which has been merged, and also worked on rebasing the 'Free the Blobs' project. A part of this project, the block signing section, was extracted out into a separate PR, which Tuyen has provided feedback on. Gajinder also mentioned the ongoing spec changes around Blob transactions and the debate around endianness in the underlying blob library and blobs themselves.

    • DevNet 5 Issues: On DevNet 5, most nodes are out of sync except for Lighthouse nodes, which were on a different fork. Lighthouse was having issues serving the Lodestar node, leading to slow slot syncing. Gajinder has flagged these issues to the Lighthouse team.

    • Restarting Lodestar: Gajinder is investigating an issue where restarting Lodestar leads to it starting from many epochs back, from the last finalized state. A potential solution could involve saving the last finalized state and clearing previous finalized states within a specific window, ensuring a more efficient restart.

    • Work on BLST Implementation: Gajinder, Matthew, and Cayman are collaborating on BLST implementation for multi-threading.

    • Implications of RLP in CL: The change from SSZ to RLP for Blob transactions in the CL layer could have implications for the current implementation, specifically in terms of matching commitments against the version hashes in the transaction. However, a workaround is proposed, involving sending computed version hashes to the EL layer for verification against the transactions, which can avoid the need for deserializing the transaction in the CL. This means that for the time being, RLP support might not be required in the CL.

  • Lion's Work: Lion has completed the PR for the network thread, particularly improving the functionality of loggers. He has also worked on some DevOps issues and is currently working to improve debug logs.

  • Nico's Work: Nico has been looking into incompatibility issues reported by Rocklogic and has updated the way they generate change logs in preparation for v1.9. He is also investigating issues like the high priority node issue with the node not shutting down cleanly. He mentioned that they can now publish the dev node packages themselves, but this involves on-chain transactions which can be costly.

  • Tuyen's Work: Tuyen has finished the work on stable P-scoring and is now working to improve block import by batching onto the I/O operation. He has also worked on a solution to search for unknown block routes when there is an attestation with an unknown route.

  • Nazar's Work: Nazar has merged the ETH call implementation for the prover and is now working on the last method, estimate gas. He is also extending the simulation test with Capela support and planning to run the simulation environment in the CI for end-to-end tests.

  • Matthew's Work: Matthew has made significant progress with the BLST library, with all specs now passing. He is also beginning the implementation in Lodestar and has created his first PR, which included metrics and a Grafana dashboard. Matthew is also working on finalizing the Flamescope work for inclusion in v1.9.

  • Cayman's Work: Cayman has been working on unblocking the networking thread PR and has started reviewing Matthew's BLST PRs. He is also planning to work on IPv6 support for Lodestar.

Clone this wiki locally